Skip to main content

Join us as we pursue our exciting new vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about solving problems using data and seek to deliver the best experience for customers. At Splunk, we focused on our work, our customers, having fun, and most importantly about each other's success.

As applications and systems become more sophisticated and user experience is at high stake, observability – the ability to monitor and understand those systems and how they impact users – becomes one of the biggest challenges for engineering teams. We are building world-class tools to help engineers deliver better, faster, and more reliable applications.

Role

Splunk Metrics and Analytics team is seeking an outstanding Principal Site Reliability Engineer (SRE), to lead, design, and tune the large scale highly available distributed persistence systems at the heart of Splunk Observability. You will be working with a team of passionate and skilled engineers in automation, scaling, tuning, and troubleshooting of Cassandra, Elasticsearch, and MongoDB databases. We are looking for motivated, hardworking and focused individuals who have a real passion for operational excellence, data systems, and automation.

Responsibilities:

  • Manage large critical Cassandra and Elasticsearch clusters supporting billions of transactions per day
  • Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues
  • Tune databases in collaboration with other service owners who rely on them
  • Evaluate new technologies, tools, and processes.
  • Define roadmaps for the technical evolution of our data systems.
  • Manage relations with the broader Splunk SRE community in representing concerns for the Metrics and Analytics engineering domain
  • Bring data to operational excellence by incorporating telemetry data into site reliability
  • Be a mentor and a coach in the engineering team in SRE practices and operational excellence
  • Implement DR strategies, including backups and recovery techniques with minimal downtime.

Requirements

  • 10+ years of experience as an engineer (SRE, SDET, or development)
  • Experience managing large highly available Cassandra clusters at-scale
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Experience with Kubernetes, Docker, and container orchestration
  • Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar
  • Knowledge of standard methodologies related to security, performance, and disaster recovery.

Education:

  • Bachelor’s in CS degree or equivalent.

We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying. For job positions in San Francisco, CA, and other locations where required, we will considered for employment qualified applicants with arrest and conviction records.

(Colorado only*) Minimum base salary of $135,000. You may also be eligible for incentive pay + equity + benefits. Note: Disclosure per sb19-085 (8-5-201 et seq).

 
 
Splunk's Hiring Practices
Splunk turns machine data into answers. Organizations use market-leading Splunk solutions with machine learning to solve their toughest IT, Internet of Things and security challenges.
 
Individuals seeking employment at Splunk are considered without regards to race, religion, color, national origin, ancestry, sex, gender, gender identity, gender expression, sexual orientation, marital status, age, physical or mental disability or medical condition (except where physical fitness is a valid occupational qualification), genetic information, veteran status, or any other consideration made unlawful by federal, state or local laws. Click here to review the US Department of Labor’s EEO is The Law notice. Please click here to review Splunk’s Affirmative Action Policy Statement.
 
Splunk also has policies in place to protect the personal information candidates disclose to us as part of the application process. Please click here to review Splunk’s Career Site Privacy Policy.

Splunk does not discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. Please click here to review Splunk’s Pay Transparency Nondiscrimination Provision.

Splunk is also committed to providing access to all individuals who are seeking information from our website. Any individual using assistive technology (such as a screen reader, Braille reader, etc.) who experiences difficulty accessing information on any part of Splunk’s website should send comments to accessiblecareers@splunk.com. Please include the nature of the accessibility problem and your e-mail or contact address. If the accessibility problem involves a particular page, the message should include the URL of that page.

Splunk doesn't accept unsolicited agency resumes and won't pay fees to any third-party agency or firm that doesn't have a signed agreement with Splunk.

To check on your application click here.
 

DIVE DEEPER

Find out what makes Splunk such a great place to work

box1 box1
Our Values

We are Splunk. How? Well we're passionate about customer success, driven by data, real and respectful, serious about fun, powered by our Million Data Points, and are all in this together.

Learn More
box2 box2
Our Locations

From San Francisco to Shanghai, Splunkers work in 25+ offices across the globe.

Learn More
box3 box3
Early Talent Program

Intern with people you want to hang out with, even outside the office.

Learn More
box3 box3

Our Blog

Hear from Splunkers on the latest.

Learn More
box2 box2
Diversity & Inclusion

Culture of Inclusion: Splunkers Share Their Stories

Learn More
box1 box1
LinkedIn

Follow Splunk on LinkedIn for job announcements, company news, and more.

Learn More