Splunk Cloud is looking for a Manager to provide day-to-day leadership to our cloud operations center (CNOC). This position is responsible for the Incident management process design and continuous service improvements as necessary to achieve the objectives of the business. As manager of the Cloud Ops Center, you'll lead a team responsible for the 24/7 support and monitoring of our rapidly growing Cloud Platform. You'll utilize analytics to plan, implement and continually improve processes that lead to an improvement in overall MTTR. We're looking for someone to bring a fresh approach to problems of all shapes and sizes and help us build a best-in-class Cloud Operations Center.
- Solve issues and participate in on-call support, ensuring stability and performance of the Splunk Cloud environment.
- Partner with our SRE teams to deliver agile, highly automated capabilities to monitor applications and our cloud infrastructure.
- Drive automation (of runbooks) and software-defined approaches to reliability and availability as well as change management.
- Work Closely with various groups within Operations to drive efficiencies. Including authoring of runbooks and key alert metrics, and overall health and stability of monitoring.
- Represent the Cloud Operations Center in meetings/process changes and make recommendations on new procedures/ processes.
- Work with your peers across the organization to handle related or dependent release activities.
- Act as a Liaison between SRE, monitoring teams, support and leadership for new processes, tools and knowledge transfers.
- Oversee all Cloud Operations Engineers and leads and ensure all duties and tasks are being performed expertly and effectively during each shift.
- Mentor and coach new team members
- Provide Incident commander responsibilities, contribute to post incident review, and follow through with action plans
Who you are:
- 2-4 years in hands-on manager position.
- Deep understanding of Cloud (AWS, Azure, GCP).
- Experienced in Systems Administration or Technical Operations
- Hands-on experience maintaining and troubleshooting Linux/UNIX servers in a production environment.
- Strong knowledge of and experience with Config management
- Collaborative with exceptional social and interpersonal skills.
- Calm and collected in stressful situations, such as a major service outage.
- Take charge personality, and the ability to drive a plan to completion.
- Comfortable working in a dynamic environment with a highly technical team.
- Demonstrated attention to detail, follow through, and ability to prioritize quickly are necessary.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.