The Technology Vision Behind SignalFx

By Splunk

Today we sit down with co-founder and CTO Phillip Liu to get his thoughts on the launch and the technology vision of SignalFx.

How are you doing today?
Great after our successful launch! We have spent the last two years building some great software and solving some very difficult problems while working closely with a great set of beta customers. We are very excited to share our work with the rest of the world.

What inspired you to build SignalFx?
My career has always been directly or indirectly involved in monitoring systems or applications. Most recently I spent a number of years at Facebook and my experience there shaped many of our initial ideas for SignalFx.

At Facebook we started out like a lot of other web and SaaS companies using open source monitoring tools like Nagios and Ganglia. We quickly realized that, given our growth and scale, we would be spending more time and effort maintaining and customizing these open source tools than it would take to build something tailored to our needs. So we decided to go back to the drawing board and figure out what our ideal monitoring solution would look like if we built one from scratch.

It was during this process that the way we looked at monitoring a large-scale web application fundamentally changed at Facebook. We couldn’t just look at individual components anymore. The amount of noise we were generating from component-level alerts was staggering. At Facebook scale, there could be several thousand alerts going off at any given time even if there were no problems with the overall service. We needed something smarter. We also had huge amounts of useful information about the state of our applications in log files, but no easy way for our development teams to extract insights from that data without getting other teams involved.

We concluded that we needed to shift monitoring into a centralized service that could look at patterns across entire populations of systems and applications, instead of focusing on check against individual systems. This organically evolved to become ODS, the metrics-based monitoring system at Facebook that processes trillions of metrics a day and is used by every developer and operations engineer at the company to monitor the production Facebook application. Time series metrics proved to be the most compact way of sending data. Once we were able to get all these metrics into a central data store we then started to ask ourselves “what do we want to do with the data?” From there we naturally started doing more and more sophisticated aggregations, analytics, and visualizations.

How did your experience at Facebook influence how you designed SignalFx?
When Karthik and I started SignalFx in 2013, we realized that technically the monitoring landscape had not changed dramatically since the analysis my team had done in 2008 at Facebook. Some open source projects had emerged, and some startups had begun to commercialize or mimic them–but largely web and SaaS companies were still struggling with the same challenges we’d had at Facebook.

How is SignalFx different?
We believe that monitoring modern applications is inherently an analytics problem. The investment to build a state-of-the-art, homegrown monitoring solution can be quite substantial and our experiences in operating such systems at scale for large scale web companies has enabled us to build a product with both greater capabilities and lower cost than most could do on their own. SignalFlow™, our core technology, is a streaming analytics engine that takes monitoring away from component-level alerts to being more meaningful. Users create SignalFlow analytics pipelines that perform statistical aggregations and transformations of time series data, both real time and persisted, as it flows through the SignalFx service. In addition, multiple analytics pipelines can be combined and compared to generate new time series. The output of SignalFlow analytics is usually available within two resolutions of time series reporting frequency. This responsiveness is important in reacting to anomalies as they’re detected. All the capabilities of SignalFlow are available in an interactive and intuitive user interface.

Without giving away too much of the secret sauce, what has been the greatest technical achievement of SignalFx so far?
We’ve spent the past two years building a state-of-the-art monitoring platform. There are many things I am proud of for the team along this journey to launch, but three things stand out the most:

Getting robust streaming analytics at scale, not just streaming data. That is really challenging and quite novel, especially when you think about handling all the nuances of data coming from all kinds of sources, whether it be too much, late, repeated or inconsistent. It’s actually a tremendous problem when you get to the scale of billions of data points coming in at a time.
Making the user experience easy and intuitive when you have a really powerful platform dealing with large masses of constantly moving data. I want people to think “This makes sense to me, I can explore the data and figure out anything.”
Providing accurate analytics in an ephemeral deployment environment. Many of our customers employ elastic techniques to quickly turn up and down compute capacity. In this environment, it’s difficult to keep monitoring configurations up-to-date. Our system detects when sources stop reporting metrics and automatically remove these stale sources from the analytics pipelines.

We are looking forward to having teams try SignalFx, give us feedback and to start looking for patterns. The common and creative ways you use SignalFx will direct the evolution of the platform.

Thanks,
Phillip Liu

Monitor Amazon EKS Anywhere with Splunk

Splunk Infrastructure Monitoring provides a turn-key, enterprise-grade Kubernetes monitoring solution for Amazon EKS. It also provides out-of-the-box monitoring of the Kubernetes Control Plane. With Splunk’s support for EKS-A, our joint customers can confidently run Kubernetes in all environments – cloud-native with Amazon EKS, hybrid with Amazon Outposts and on-premises self-managed environments using EKS-A.

DevOps 3 Min Read

Prometheus Direct Integration Comes to Splunk Infrastructure Monitoring

A typical Prometheus environment consists of integrations that scrape your applications for four metric types; counters, gauges, summaries, and aggregated histograms. A central server is required to pull each of the endpoint resources and aggregate them.

DevOps 7 Min Read

Lessons Learned From Monitoring Next Gen Infrastructure

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

The Technology Vision Behind SignalFx

Related Articles

Monitor Amazon EKS Anywhere with Splunk

Prometheus Direct Integration Comes to Splunk Infrastructure Monitoring

Lessons Learned From Monitoring Next Gen Infrastructure

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram