Join us on the Splunk TechOps Team, empowering our customers to execute our vision making machine data accessible, usable, and valuable to everyone! The Splunk TechOps organization runs Splunk cloud, blending SRE, Systems Engineering and Service Engineering subject areas, across functional global teams. As a TechOps Monitoring Engineer, you will own the monitoring of our customer-facing SaaS product, Splunk Cloud. Come join a team that is striving for operational awesomeness and trying to automate the world. We have a large presence with large cloud vendors. You should have experience with architecture, deployments, and networking in one or more of the major industry vendors. This is an incredible opportunity to use your existing cloud experience and drive the growth of Splunk Cloud.
- Craft and develop monitoring to improve the observability and reliability of Splunk products.
- Mentor new engineers to achieve more than they thought possible.
- Work across the organization to deliver quality products that delight Splunk's hardworking users.
- Experience with one-or-more monitoring systems at a moderate scale. Open Source or commercial. Prometheus' experience a plus
- Experience with CI/CD system for deploying code to dev/stage/production environments
- The ability to dig in and understand the root cause of sophisticated issues
- Experience with machine config as code systems - Puppet etc
- Dedicated to developing well tested and maintainable code.
- Understanding of observability libraries and ability to instrument code to expose new application metrics
- You enjoy making other teams successful and are fulfilled through the success of others.
- You enjoy crafting, developing, and maintaining distributed systems at scale in production.
- You understand the challenges and trade-offs to be made when building and deploying systems to production.
- Knowledge of standard methodologies related to security, performance, and disaster recovery.
- Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.
- Experience developing with Golang or Python
- Experience setting up detectors and alerts in SignalFX
- Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, load balancing.
- Experience with development and deployment in a hosted cloud environment, preferably AWS & GCP.
- Experience with distributed cloud service development, infrastructure, traffic management and architecture.
- Experience with optimized and scalable software that operates on a large number of nodes.