Join us as we pursue our disruptive vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we’re committed to our work, customers, having fun and most meaningfully to each other’s success. Learn more about Splunk careers and how you can become a part of our journey!
Role:
Splunk's DSP Apache Pulsar group is looking for an expert Principal SRE to help lead, design and build the next generation messaging service using Apache Pulsar. You will be working on the cloud platforms AWS, GCP & Azure.
Responsibilities:
- Work across the organization to deliver quality products that delight Splunk's passionate users.
- Lead teams of tight-knit, super smart engineers who are building a state-of-the-art, cloud-based environment for massive-scale data processing.
- Mentor and help new engineers to achieve more than they thought possible.
Requirements:
- BS EE or CS degree; 12 + years related experience (or Masters and 8 + years related experience or PhD and 5+ years experience)
- You are passionate about building and running distributed systems at scale in production. You understand the challenges and trade-offs to be made when building and deploying systems to production.
- You are an expert in working with container deployment and orchestration technologies at scale with knowledge of the fundamentals to include service discovery, deployments, monitoring, scheduling, load balancing.
- You have a deep understanding of Systems architecture (network stack, file system, OS services, storage subsystems) and have implemented features around these.
- You have experience implementing reliability features into application code. This may include emitting metrics or other health indicators, survivability, or multi-site features.
- You've demonstrated the ability to effectively work collaboratively across functions.
- You are passionate about making the many users of your product happier every day.
- You make decisions based on measurable data. You promote this cycle of measurement, experimentation, and improvement to other teams.
- You understand how services scale, fail, and recover. You recommend architectural changes and work directly with applications to implement your designs.
- You are passionate about reliability as a feature and solving reliability challenges across the organization.
Preferred skills:
- Experience with running multi cluster environments and strong understanding of multi-tenancy and security implications.
- Knowledge of Kubernetes, Go and Docker
- Working knowledge of messaging systems like Apache Pulsar, Rabbit MQ, Active MQ, etc
- Experience with development and deployment in a hosted cloud environment, preferably AWS and GCP
We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.
For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.