CNOC Shift Supervisor/Incident Commander It is the responsibility of the Splunk CNOC to monitor and resolve issues that affect the availability and performance of Splunk for our cloud customers 24/7. As the authority on our customer’s experience, the CNOC is the frontline of defense in making sure each of our customers has an extraordinary experience.
We are looking for a hardworking individual to lead a shift of driven engineers in assessing and correcting issues that arise during the shift. You will also be responsible for driving Incidents to resolution as the incident commander. You will be joining our team in supporting our ever-expanding Cloud platform.
● Prioritize work for the CNOC engineers and lead inter-shift relations, ensure all shifts have adequate resources to meet business needs
● Mentor the CNOC engineers on your shift on complex tasks and developing their skill set
● Lead CNOC team members and ensure all duties and tasks are being performed efficiently and effectively during each shift
● Regular review of all open alerts to make certain SLAs are being met.
● Represent the CNOC in meetings/process changes and make recommendations on new procedures/ processes.
● Use the Splunk Incident Management System (SIMS) to restore normal service operations as quickly as possible to minimize the impact to business operations during escalated incidents.
● Act as a Liaison between SRE, monitoring teams, support and leadership for new processes, tools and knowledge transfers.
● Build automation to prevent problem recurrence; eventually automate response to all non-exceptional service conditions. ● Provide incident management responsibilities including acting as IC or UC depending on events
● Lead by example and drive the core values of the company ● Always ensure a quality customer experience.
● You have 1-3 years of experience in the following areas: ○ Virtual Machines / Cloud administration (AWS / VMware) experience ○ Incident response and major incident management. ○ Experience leading a team, preferably in operations or related
● You’ve got experience maintaining and troubleshooting Linux/UNIX servers in a production environment.
● You have experience using config management (Puppet, chef, salt), Cloud (AWS, Azure, GCP) and On-call notification (Pagerduty, VictorOps). ● You are collaborative with extraordinary interpersonal and communication skills.
● You remain calm and collected in stressful situations, such as a major service outage.
● You have demonstrated attention to detail, follow through, and the ability to prioritize quickly are necessary.
● You think out of the box and are able to work on multiple tasks simultaneously and dynamically adjust priority.
● Work is shift based either Day, Swing, Grave, will require weekends, overnight and holidays with flexibility to work additional shifts on short notice
● Experience using Splunk to identify operational issues is a plus.
We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying. For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.