Case Study

PagerDuty Ensures End-to-End Visibility With Splunk Cloud and Amazon Web Services

Executive Summary

Customers turn to PagerDuty, an enterprise incident resolution service, to manage and resolve their IT incidents quickly and efficiently. When the cloud-native company needed a solution to meet its operational analysis and triage needs, it adopted Splunk Cloud running on Amazon Web Services (AWS). With Splunk Cloud and AWS, PagerDuty ensures high availability of its services and can scale to meet customer demand. Since deploying Splunk Cloud, PagerDuty has seen benefits including:

  • Ensured customer satisfaction and highly available cloud services
  • A 30 percent gain in cost savings over prior service
  • Reduced IT and security incident resolution time—from tens of minutes to single digit minutes or seconds

How Splunk and AWS Enabled End-to-End Visibility for PagerDuty

Challenges
    • Needed a scalable solution to meet its operational analysis and triage needs
Business Impact
    • Ensured customer satisfaction and highly available cloud services
    • Gained 30 percent cost savings over prior service
    • Reduced IT and security incident resolution time from tens of minutes to single digit minutes or seconds
Data Sources
    • AWS, CloudTrail  
    • Internal PagerDuty data
    • GNU Octave 
    • Syslog

Why Splunk

Arup Chakrabarti is director of infrastructure engineering at PagerDuty, covering site reliability, internal platform and security engineering. His organization’s charter is to promote productivity and efficiency across the company’s entire engineering organization, consisting of multiple engineering teams within the company’s product development organization.

Prior to adopting Splunk Cloud, PagerDuty relied on a logging solution that could not scale as the company began indexing hundreds of gigabytes of logs daily. What’s more, the team found it difficult to get actionable information out of its data to make decisions and solve problems quickly. After running its previous service and Splunk Cloud side by side, the team determined that Splunk Cloud provided the speed required to resolve issues quickly and ensure high availability to its customers. Within days, the engineers migrated to Splunk Cloud.

“With the previous solution, some queries took up to 30 minutes to crunch the data and give us the information we needed, and that was simply unacceptable,” Chakrabarti says. “From a customer impact standpoint, we ended up shortening that time to resolution from tens of minutes to single digit minutes or seconds with Splunk Cloud.”

Chakrabarti notes that while cost was not the primary driver in selecting Splunk Cloud, “My accounting team was absolutely ecstatic when I told them, ‘We’re going to get the best solution, and by the way, it’s 30 percent cheaper compared to what we are currently using.’”

“My accounting team was absolutely ecstatic when I told them, ‘We’re going to get the best solution, and by the way, it’s 30 percent cheaper compared to what we are currently using.’”



Arup Chakrabarti, Director of Infrastructure Engineering, PagerDuty

Cloud platform for enterprise-wide visibility, high availability

Today, PagerDuty has AWS, the world’s most comprehensive and broadly adopted cloud service and Splunk Cloud as its platform for operational visibility and triage across the business—from IT operations monitoring to security and compliance. With Splunk Cloud, engineering teams have a solution for monitoring and alerting, and then can dig deeper into the source of issues and resolve them quickly.

“PagerDuty runs one of the most highly available, reliable services on the planet. But there’s inherent tension between creating a highly available service and still being able to innovate on a consistent cadence. Splunk Cloud helps us manage that. If something goes wrong, we can figure it out and fix it really quickly, before our customers are even aware there is a problem,” Chakrabarti says. “We’ve also been cloud-native and on AWS almost since the beginning, and I don’t want to imagine the world without AWS.”

Enhancing security and compliance

Currently, 80 to 90 percent of PagerDuty’s data is internally generated from customers using its services, and PagerDuty’s AWS account is another source of data. In addition to Splunk Cloud, the company also uses the Splunk App for AWS, which provides visibility—from a security and compliance standpoint—and an audit trail of all of the activity in its AWS account.

Splunk Cloud augmented PagerDuty’s existing security program, helping to make it easier to run. “Splunk Cloud plays an important role because it collects all data in one place and provides rich context,” Chakrabarti says. “From a security response perspective, we know that our automation and the alerts that we build in Splunk Cloud are always smarter than any tools that we’re running separately because Splunk is able to definitively answer whether something is going on. Because of that, we can figure out whether we are under attack or it’s something we can ignore.”

As another example, Splunk Cloud provides PagerDuty with data exfiltration visibility. Whereas other tools create noise because engineers focus on monitoring alerts, Splunk Cloud adds value by pinpointing issues the company needs to address immediately.

Engineering and beyond

As cost data flows into Splunk Cloud, the finance team also has begun to use the platform for visibility into customer usage trends and to answer some questions on its own. Other executives, such as the vice presidents of product development and product management, find it straightforward to use Splunk Cloud to answer higher level questions as well.

“The value that Splunk Cloud provides is that it’s just one less thing that I have to worry about or hire engineers for,” Chakrabarti concludes. “We can focus on running the business and take care of the things our customers care about.”

“The value that Splunk Cloud provides is that it’s just one less thing that I have to worry about or hire engineers for. We can focus on running the business and take care of the things our customers care about.”



Arup Chakrabarti, Director of Infrastructure Engineering, PagerDuty