Join us as we pursue our disruptive vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we’re committed to our work, customers, having fun and most meaningfully to each other’s success. Learn more about Splunk careers and how you can become a part of our journey!
Splunk's Cloud group is looking for an expert Principal Reliability Engineer to help lead, design and build the next generation of our large scale Cloud offering. You will be working on the core compute platforms in the cloud.
- Work across the organization to deliver quality products that delight Splunk's passionate users.
- Lead teams of tight-knit, super smart engineers who are building a state-of-the-art, cloud-based environment for massive-scale data processing.
- Mentor and help new engineers to achieve more than they thought possible.
The following is not a prescriptive must have list, but a good candidate should be able to demonstrate many of these.
- 12+ years of DevOps experience in building automation and tools in large and complex cloud/on-prem/hybrid infrastructure.
- You are passionate about building and running distributed systems at scale in production. You understand the challenges and trade-offs to be made when building and deploying systems to production
- You have expertise in working with container deployment and orchestration technologies at scale with strong knowledge of the fundamentals to include service discovery, deployments, monitoring, scheduling, load balancing. Knowledge of Kubernetes, Terraform, and Docker preferred
- You will bring strong experience in handling large numbers of diverse systems in hybrid cloud/on-prem environment with configuration management systems like: Puppet, Chef, Ansible, or Salt
- You have proven ability to write tools, applications, and automations using a high-level programming language like: Python, Go, Ruby, Perl or C++
- You have the proclivity towards efficient programming emphasizing improvement via complexity analysis as well as passion for eliminating repetitive manual processes using automation
- You will bring a strong foundation in RESTful APIs, Cloud Architecture, IAM, and Cloud security
- You have a deep understanding of the Linux Operating System and Systems programming, including but not limited to Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals, File Systems, Network Stack (L2, L3, Network Architecture, VLANs)
- You have a deep understanding of networking best practices, protocols, and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies
- You've demonstrated the ability to effectively work collaboratively across functions.
- You are enthusiastic about making the many users of your product happier every day
- You have a strong sense of ownership, customer service, and integrity demonstrated through clear communication and thoughtful actions
- Experience with running multi cluster environments and strong understanding of multi-tenancy and security implications
- Experience with development and deployment in diverse public cloud environments such as AWS, GCP, and Azure
- Experience in administering distributed Splunk environment with Indexer Cluster, Search Head Cluster, HTTP Event Collector, Forwarders (Universal/Heavy), Deployer, Deployment Server
- Cloud transformation and on-prem to cloud migration experience
- Experience with compliance environments such as SOC2, PCI, HIPAA, FedRAMP Moderate.
- Strong knowledge of /giphy and appropriate use of memes.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.