DevOps

July 11, 2020

3 Minute Read

Closed-Loop Automation With Splunk Infrastructure Monitoring and Amazon EventBridge

By Splunk

Microservices have delivered on the promise of freeing up individual teams to work on services independent of one another. This flexibility has allowed for quicker and more agile application development. However, the explosion of services has added complexity to debugging efforts. The task of managing complex interactions between tens or hundreds of microservices turns on-call incident resolution into an absolute nightmare, not to mention the risk of a bad customer experience.

As our environments and SLA (service-level agreement) expectations evolve, our processes and observability tools must evolve with them. Rote tasks such as code rollbacks and auto-scaling of compute resources can now be automated away for increased responsiveness, customer satisfaction and SLA compliance. As a result, developers and operators can focus on more critical tasks while also increasing their productivity. As an industry we need to eliminate situations where on-call staff have to be interrupted from their current tasks or woken up at 3am to fix an easily automatable task.

"Toil is manual, repetitive, automatable, tactical work that scales linearly and is the main source of concern for SREs (Site Reliability Engineers). 59% believe there is too much toil in their organization and not enough has been automated to reduce that toil."
— 2019 State of SRE Report, Catchpoint

Introducing Splunk Infrastructure Monitoring and Amazon EventBridge Integration

We are excited to be an official launch partner of Amazon EventBridge. This new integration makes it simple to leverage the real-time problem detection capabilities of Splunk Infrastructure Monitoring to automate remediation actions. DevOps and SRE teams can now fully realize the promise of programmable infrastructure through closed-loop automation.

Amazon EventBridge is a serverless event bus that connects applications together, delivering a stream of real-time data from AWS resources, SaaS applications, and data from your own applications. With this new integration, joint customers of Splunk Infrastructure Monitoring and AWS will be able to operate their infrastructure and applications with continuous closed-loop automation to improve responsiveness, SLA compliance, customer experience, and the overall productivity of DevOps and SRE organizations.

Event.Bridge.Diagram.SignalFx

Traditional Incident Response is a Slow Manual Process

Historically, there have been three significant barriers to proactive closed-loop event response in IT Automation:

The latency of traditional observability tools is high, taking minutes and sometimes hours to get results from a query.
Basic infrastructure metrics (CPU, Memory, etc.) combined with static alerts do not catch anomalies and outliers in complex distributed systems.
Routing and handling information has traditionally been a security and technical nightmare.

Traditional incident response has been reactive rather than proactive. The common solution has been a one-off script or a large, general purpose tool that nobody wants to maintain. You either page someone or try to uncover the bug yourself.

Closed-Loop Automation is Critical for Microservices

With Splunk Infrastructure Monitoring and Amazon EventBridge, the benefits are indeed a breakthrough. What took minutes, if not hours, to get actionable responses from traditional observability tools, the Splunk Infrastructure Monitoring real-time streaming analytics architecture processes even the most sophisticated alert detectors within seconds. The SignalFlow^TM analytics engine, at the core of the Splunk Infrastructure Monitoring observability platform, is uniquely equipped to address the complexity and volume of data that modern environments impose. Thanks to patented data science SignalFlow^TM creates a dynamic view of the environment and identifies true outlier conditions in seconds.

Amazon’s EventBridge interface completes the picture, providing an advanced, scalable dispatch system capable of routing and handling events at any scale. Now, your systems can react in seconds so that the incident response is underway before a human could even realize something needs attention. After the incident is already remediated, you can then start to troubleshoot. Splunk Infrastructure Monitoring's Outlier Analyzer^TM enables rapid root-causing. By analyzing every single transaction across your microservices and correlating across your application code and infrastructure it isolates your problem with one click.

closed.loop.automation.SignalFx

Setting up the Splunk Infrastructure Monitoring Integration with EventBridge is as easy as 1, 2, 3:

Select Amazon EventBridge from the integrations page in Splunk Infrastructure Monitoring and enter your Amazon organization ID and AWS Region in the configuration:
Associate the Event Source in AWS:
Configure a detector in Splunk Infrastructure Monitoring and set your Amazon EventBridge integration as the target of the detector:

That’s all you need to do to get events flowing into AWS!

Learn more about Splunk Infrastructure Monitoring and get a 14-day free trial.

Happy Splunking,
Ryan Powers

Monitor Microsoft Azure Functions in Real-Time

Discover how we've extended our Splunk Infrastructure Monitoring analytics capabilities to our Microsoft Azure customers so they too can monitor their functions in real-time.

DevOps 3 Min Read

Announcing the GA of Splunk APM’s AlwaysOn Profiling

Splunk APM now includes AlwaysOn Profiling for Java applications, providing app developers and service owners continuous visibility of code level performance to troubleshoot production issues faster.

DevOps 7 Min Read

Efficiency Over Speed: Getting More Performance Out of Kafka Consumer

Why we wrote a Kafka consumer? We needed a non-blocking consumer with low overhead. The performance characteristics we were aiming for including consuming 1000s of messages per second, while dealing with GC. Splunk Infrastructure Monitoring offers a dashboard out of the box that shows you the most important Kafka metrics at a glance.

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

Closed-Loop Automation With Splunk Infrastructure Monitoring and Amazon EventBridge

Introducing Splunk Infrastructure Monitoring and Amazon EventBridge Integration

Traditional Incident Response is a Slow Manual Process

Closed-Loop Automation is Critical for Microservices

Setting up the Splunk Infrastructure Monitoring Integration with EventBridge is as easy as 1, 2, 3:

Related Articles

Monitor Microsoft Azure Functions in Real-Time

Announcing the GA of Splunk APM’s AlwaysOn Profiling

Efficiency Over Speed: Getting More Performance Out of Kafka Consumer

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram