A few days ago, I tried ordering lunch from a local restaurant. I went online, spent time looking at their menu, chose a few items, clicked “submit” and… got an error message. This experience is not unique. Earlier this week, when I tried to read the news, I got a message saying they’re experiencing a technical issue. You can see both messages below:
Flying Blind & Customer Churn
Using online channels as a means of connecting with your customers has been gaining momentum over the past few years, but these days it’s almost the ONLY way to do so. Finding and eliminating front-end issues has quickly become about more than brand perception and developer productivity, it’s now critical for thriving, or even just surviving, in the new business environment. To do this effectively, you need detailed understanding of user experiences, which hinges on seeing and understanding all the data about anything — from internal tools to user activities — that might affect it.
Unfortunately, we discovered in talking to customers that the same RUM solutions that are supposed to help are too slow and often alert on issues only after users complained on social media or contacted the helpdesk. Traditional RUM solutions provided only partial data, effectively rendering those who used them blind to the actual user experience — significantly slowing down troubleshooting and preventing the creation of great online experiences. This is also the reason why users keep experiencing problems like the ones I described earlier, and end up churning.
To understand why that is the case, it’s important to understand the existing practice. Whenever a user takes an action in a website or an app, the web browser generates a sequence of events, also known as browser spans, which are analyzed by RUM solutions. These events can then require additional processing in the backend systems, creating separate, backend traces, which in turn are analyzed by Application Performance Management (APM) solutions. Existing RUM solutions rely on partial data, a siloed approach, and proprietary instrumentation resulting in slow, manual and frustrating troubleshooting, all while users are leaving because they cannot get what they want.
Allow me to expand a bit more:
- Partial Data: Existing RUMs are separated from APMs, and so to get a clearer picture of the user experience SREs or Developers need to manually correlate the front-end browser spans to the backend traces. The problem is that aside from Splunk APM, all other APM solutions sample the backend traces, resulting in partial data about the user experience, and making it impossible to stitch together an accurate end-to-end view of user activity.
- Siloed Approach: Current monitoring solutions focusing on metrics, traces, and logs operate in silos, and so correlating data from different sources becomes a manual task. This becomes increasingly difficult as the amount of data grows, and is beyond human ability when operating in the cloud.
- Proprietary Instrumentation: Many vendors use proprietary data collection methods, also known as instrumentation. In addition to vendor lock-in and higher prices, proprietary instrumentation forces companies to become dependent on the roadmap of others, slowing down their innovation. Additionally, developers who know their applications better than anyone else usually do not invest the time to customize telemetry data to a proprietary solution that might be swapped out, leading to apps sending only very partial telemetry data.
Full-Fidelity End-to-End Visibility of the User Experience
At Splunk, we decided to solve those issues using an entirely different approach based on OpenTelemetry instrumentation to ingest ALL data, and applying advanced streaming analytics methods to provide meaningful, actionable insights.
Let’s start with the data. In order to create amazing user experiences companies have to understand what each and every user is experiencing. Splunk is unique in providing full-fidelity end-to-end visibility by ingesting ALL front-end spans and ALL backend traces so that EVERY interaction between the browser and the backend or third party service providers can be traced and understood. We ingest all this data using a streaming analytics engine so that any issue is alerted on within seconds and does not turn into a social media nightmare.
Splunk RUM provides Full-fidelity end-to-end visibility
As mentioned, efficient troubleshooting requires putting together data from metrics, traces and logs. Some vendors are advanced enough to realize that this is a challenging process, and apply analytics and machine learning algorithms to try accelerating the troubleshooting process. But when such algorithms are applied on sampled and partial data, their results are not very useful.
As part of the Splunk Observability Suite, Splunk RUM users can leverage ALL metrics, traces and logs in a single, tightly coupled user interface. By applying AI-driven analytics to all three pillars of observability, Splunk helps Developers, Ops and SRE teams quickly find the root cause of any issue and alert only the team responsible for that specific issue. It significantly accelerates the troubleshooting process and frees up developers’ time so they can focus on creating more new and exciting experiences for their users.
In contrast to the proprietary instrumentation approach, Splunk RUM uses instrumentation based on OpenTelemetry Web (part of the second most popular Cloud-Native Computing Foundation project) to ingest ALL metrics, traces and logs. By adopting OpenTelemetry, companies future-proof their applications, are not locked in with any single vendor, and give their developers full control over their data. With OpenTelemetry, developers know their work will continue to be relevant even if they switch vendors and, as we have seen with many of our customers, they can invest the time needed to optimize the telemetry data that their apps send. As co-founders of OpenTelemetry Splunk is 100% committed to it and has been the most active contributor to the project over the past year.
More to Come
Splunk RUM is a category defining product, and has already been adopted by multiple customers from diverse industries such as retail, manufacturing, logistics and financial services. I’m very excited to see how they all take user experience to the next level.
Follow all the conversations coming out of #splunkconf20!