Monitoring modern systems requires DevOps teams to collect, aggregate, and alert on data from hundreds to thousands of different services in real time. Splunk Infrastructure Monitoring customers with all kinds of digital business – limited product releases, real-time payments, or global satellite image processing – have expressed to us that low-latency visualizations and alerting are critical to empower teams to deploy with confidence and shorten incidents when they occur. Splunk Infrastructure Monitoring's streaming architecture solves that problem for them by delivering alerts within seconds of receiving data.
To realize the potential of real-time alerts, you need a way to quickly route them to the right people and coordinate incident response. Splunk Infrastructure Monitoring integrates with leading notification and incident management services for exactly this purpose. Our integration with Opsgenie provides a unique, best-in-class solution for cloud monitoring and incident management that works in real time, at any scale.
Alerting and Incident Management are Challenging to Scale
In the days of monolithic architectures, a central Operations team would be responsible for keeping the product up and running. A team like this typically would have sole responsibility for setting up monitoring, paging the right people, and updating alerts as new components came online. But in the era of DevOps and microservices, the responsibility for keeping a service up and running rests with the development team who built the feature: you built it, you own it.
This doesn’t mean that each DevOps team should purchase their own on-call paging tools. For tasks like on-call paging that every team needs, it makes sense to rely on centralized solutions for the company as a whole. But changes to those centralized solutions still need to execute as fast as the self-organizing team that requires them. As a company scales, the ability to maintain self-service on common operational tasks is critical to sustaining the high velocity of change that is enabled by a microservices environment. When a team is planning to deploy a new microservice, they need to be able to manage the end-to-end process of monitoring — instrumenting metrics, tracing, monitoring and alert routing — themselves.
Fully centralized solutions can struggle to keep up with this process, because they need to reflect organizational changes that in some cases have already been acted on. If someone needs to raise a ticket with IT every time the membership of their on-call schedule changes, then that on-call schedule might spend a lot of time out of step with reality.
Real-Time Cloud Monitoring Meets Modern Incident Management
This is where Opsgenie’s intelligent alert routing comes in: in Opsgenie, teams can subscribe to alerts that are relevant to them. This means that you could create one integration that would route alerts dynamically to teams.
However, the systems that send alerts to Opsgenie have never had visibility into where the alerts were going. This led to tension between the central services teams administering tools like Opsgenie, and the individual DevOps teams trying to get alerts where they need to go.
Typically, integrations with Opsgenie rely on webhooks that accept alerts before routing them to Opsgenie Teams. You can add an integration to a specific Team, or add an integration to your entire Opsgenie account that individual Teams subscribe to.
This approach works for companies with only a few teams or those using a relatively basic monitoring solution, but can quickly become impractical once users need to choose one integration from a list of hundreds or more for each alert. Furthermore, adding or removing Opsgenie Teams would require changes in the upstream monitoring product in order to send alerts to new teams.
In contrast, the Splunk Infrastructure Monitoring integration remains aware of Opsgenie Teams at all times, and presents these options when users are deciding which Teams to route an alert to. Customers with hundreds of teams can add just one Opsgenie integration that can send Splunk Infrastructure Monitoring alerts to any of them, or add and remove Teams without changing the integration on the Splunk Infrastructure Monitoring side.
Sending Splunk Infrastructure Monitoring Alerts to Opsgenie
Bring Your Teams Into a Single Opsgenie Integration
Log in to Splunk Infrastructure Monitoring and navigate to the Integrations page in the app, then search for or scroll to the “Opsgenie” tile. Click on it, select “New Integration” and assign it a name (e.g. “Opsgenie integration”), and add your Opsgenie Integration’s API key.
Click save, and your integration is now ready to use.
Easily Route Alerts to the Right Team
Open an alert rule or team notification in Splunk Infrastructure Monitoring, then click “Add Recipient” and choose Opsgenie from the list of available integrations to specify the team that should receive alerts:
Opsgenie users within that Team will now receive alerts from Splunk Infrastructure Monitoring:
Alerting and Incident Management for DevOps
Scalable, maintainable alerting and incident management solutions are critical to sustaining the pace of software innovation in modern DevOps organizations. We’ve provided Opsgenie users with a single integration for Splunk Infrastructure Monitoring that easily supports hundreds of teams, ensuring that actionable, real-time alerts are always routed to the right person.
If you’re not already using Splunk Infrastructure Monitoring, get started with a free trial.
This post features contributions from Rebecca Tortell and Aaron Sun.