Synthetic monitoring is nothing new to most SRE and IT teams today. Synthetic monitoring tools have long formed a core part of application performance management and monitoring toolsets.
Yet no matter how familiar you are with synthetic monitoring, there is likely room to get more out of it than you currently are. Indeed, the default approach to synthetic monitoring tends to involve using it reactively: problems occur in production, and your team uses synthetic monitoring to help understand and remediate them.
That’s a start to deploying synthetic monitoring, but it falls far short of realizing its full potential as part of a performance management strategy. Teams must use synthetic monitoring tools proactively — optimizing their systems, rather than merely using synthetic monitoring to fix problems as they arise.
This whitepaper explains how to advance from basic to sophisticated synthetic monitoring techniques. As described below, there are three basic phases – “crawling,” “walking,” and “running” – that SRE and IT teams typically pass through as they work toward full maturity of their synthetic monitoring strategies. No matter where you currently stand along this path, the following pages offer tips on moving forward so that synthetic monitoring helps you optimize, rather than merely manage, the performance of your software environments.
The what and why of synthetic monitoring
Synthetic monitoring tools, which measure how applications respond to simulated requests, have enjoyed widespread adoption over the past decade or so because they provide visibility that’s difficult to achieve via other means.
Unlike real user monitoring, or RUM, which collects data about transactions from production environments, synthetic monitoring allows teams to define the precise conditions they want to simulate when evaluating application performance. Synthetic monitoring makes it easy to test for use cases that may not be as well represented in metrics collected from real-user transactions.
Synthetic monitoring also helps test for problems before they impact end users. If you perform only RUM, you run the risk that you won’t detect critical performance problems until they are already disrupting your actual customers. With synthetic monitoring, however, you can test for and identify potential problems before real customers experience them.
Of course, synthetic monitoring is only one ingredient in a complete application management strategy. RUM, log analytics, distributed tracing and other observability methodologies are equally important. Nonetheless, synthetic monitoring is a must-have technique for any team that wishes to achieve end-to-end visibility into both the applications it deploys and the customer experience it delivers.
The three stages of synthetic monitoring
There are several ways to leverage synthetic monitoring as part of a broader application performance management strategy. To understand these different levels of synthetic monitoring, it’s helpful to think of them as three stages of development.
The crawling stage
The first and simplest stage is akin to crawling. Here, synthetic monitoring is used to collect only the most basic metrics, like uptime statistics. Because these metrics are used primarily for troubleshooting, synthetic monitoring at this stage matters primarily just to the SRE or ITOps team.
Although these metrics are basic, they provide the foundation for deeper insight into the application, such as which services experience critical problems most frequently. As uptime statistics improve, they can also provide proof of the ROI for SRE and IT operations, which in turn helps SRE and IT engineers get buy-in to move onto the next stage of monitoring.
Monitoring at this stage is limited in scope and value. Monitoring for uptime alone doesn’t help to find and fix performance bottlenecks or understand the wider business-level impact of performance problems.
The walking stage
The next stage in the synthetic monitoring journey can be compared to walking. At this stage, organizations learn that “slow is the new down,” meaning that applications that perform slowly are just as problematic as those that don’t respond at all.
As a result, teams begin using synthetic monitoring to track response rates and errors in addition to uptime. With this insight, they can understand what services are consistently slow, or which ones experience a regression in performance. This enables the team to proactively detect a service that may fail because it’s getting slower and slower. This stage also allows the organization to determine which types of issues to prioritize, based on which services are experiencing the greatest problems and how impactful those problems are on the business.
Although the maturity of monitoring operations has increased at this stage, the performance metrics they track remain simplistic and incapable of delivering complete and actionable visibility. Teams might collect metrics only from key application services, for example, rather than performing end-to-end monitoring. Or, they might measure the total time it takes for an application to complete a request — instead of measuring response rates across individual services as the request moves from one service to another. In other words, teams at this stage will know what is slow, but will struggle to get to the root cause of performance problems in order to optimize performance.
Because monitoring at this stage still focuses on finding problems but doesn’t reveal their root cause, it remains the realm primarily of IT and SRE teams. If the teams lack the monitoring data necessary to pinpoint the code that causes an issue, they can’t collaborate with developers to resolve it.
The running stage
The most advanced stage — and the one that requires the highest level of organizational alignment — is the equivalent of running.
This is the stage where synthetic monitoring reaches full maturity. It’s characterized by a synthetic monitoring strategy that doesn’t collect just generic uptime and performance metrics, but goes deeper by focusing on metrics such as those that Google labels “Core Web Vitals.” These metrics include:
- Largest Contentful Paint (LCP): How long it takes a page to load from the perspective of the user. This data point may be different from what the application reports as page load time, because browser rendering delays and other issues could lead to slower loads from the user’s viewpoint than from what backend systems report.
- First Input Delay (FID): How long it takes before users can interact with a page. Here again, the page may appear to be loaded from the application’s perspective, but that doesn’t necessarily mean it’s ready to handle user input.
- Cumulative Layout Shift (CLS): How consistent the page content remains as the page loads. Content that moves around, or that loads and disappears, leads to poor CLS metrics and a confusing experience from the user’s perspective.
These metrics focus on what the user experiences, which is the most meaningful measure of performance. They can also help engineers pinpoint the most problematic components of a page, such as images that take longer to load than the rest of the content on a page, or a menu that loads quickly but does not accept input immediately. In turn, they provide deeper visibility into exactly how to optimize performance.
Another key differentiator for synthetic monitoring in the “running” stage is that it moves beyond just front-end applications metrics to include data from the application backend as well. By correlating granular performance data between different types of application components, engineers gain the visibility necessary to identify the root cause of performance problems.
By making that possible, sophisticated synthetic monitoring also allows developers to participate fully in the process. When teams can quickly link performance issues to code, developers can find and resolve problems within the codebase. In this way, synthetic monitoring at this stage becomes an integrated part of the CI/CD process, allowing developers, IT engineers and SREs to work together to deliver the highest-quality code possible.
Evolving your synthetic monitoring strategy
Moving from crawling, to walking, to running with your synthetic monitoring tools and strategy requires deliberate effort. It’s easy to stop at the walking stage, which enables basic performance management on a reactive basis, without ever reaching the proactive, optimization-oriented running stage.
To move beyond reactive synthetic monitoring and reach the running stage, you should strive to implement tools and workflows founded on the following principles.
Testing complex transactions
Getting the most out of synthetic monitoring requires tracking complex, multi-step user journeys. It’s rare for a user to initiate just a single request and then close your application. Users typically initiate an array of transactions during each visit to your site. They might search for a product, click on different items for product details, add items to their cart, check out, and so on.
Testing each of these transactions in isolation isn’t enough to guarantee an optimal user experience. Instead, to get ahead of problems before they affect your users, you need to test the complete flow by scripting the user journey across your app. Simulate all of the transactions that users could initiate, and use data produced by one transaction to drive testing for the next transaction.
Answering the “What ifs”
Synthetic monitoring lends itself to experimentation with different variables more so than other observability techniques. Be sure to take full advantage of this capability by using synthetic monitoring to test not just for standard transaction types and user engagement patterns, but also the outlying, “what if” scenarios.
What happens if you run your app without a CDN? How does one release perform relative to another? How does performance change when requests originate from different geographic regions? Being able to answer questions like these through synthetic monitoring tools will significantly enhance your ability to optimize performance.
Robust, contextual alerts
All synthetic monitoring tools can be used to trigger alerts when an anomaly occurs. But simply receiving an alert that something is wrong is not enough to optimize performance proactively.
Instead, you need robust and contextual data about each alert. Screenshots that show exactly where the error occurred, or tools that trace it to specific source code, help you do this. So does the ability to run the same test from multiple locations in order to distinguish between localized and global failures. Automatically repeating a failed request to determine whether it fails consistently or is only an intermittent problem is crucial for enabling proactive response, too.
Simulated tests that run before you deploy into production are reliable only if the dev/test environment in which you run the tests reliably mirrors your production environment. If it doesn’t, you end up performing synthetic monitoring under conditions that may not accurately represent production, which greatly undercuts your ability to preempt issues that could impact real users once your release is deployed.
Address this issue by ensuring that test/dev resembles production as closely as possible. Containers can help achieve this parity by providing identical deployment environments for both testing and deployment. But your synthetic monitoring tools should also allow you to emulate production environments closely by, for example, initiating requests from the same geographic regions where your actual users are located, and testing across a variety of user device and operating system configurations.
Synthetic monitoring is a powerful part of any application management workflow. Exactly how much value you leverage from synthetic monitoring tools, however, depends on how many advanced features those tools offer for gaining actionable insight into the complex journeys your users take as they interact with your applications. Testing only individual requests or focusing on overall uptime or response rates deprives you of the ability to take a proactive approach to performance management or the user experience.
Splunk makes it easy to take synthetic monitoring to the next level. By allowing SRE and IT teams to script complex transactions, fine-tune the variables under which synthetic tests occur, compare results from multiple tests and produce context-rich alerts, Splunk Observability Cloud turns synthetic monitoring from a reactive process into one that provides unique observability insights that help you delight your customers - watch this demo to learn more before you start your free trial.
Next, take your first steps toward an optimal customer-centric digital strategy and respond to problems more effectively through Splunk Digital Experience Monitoring to achieve greater observability. Download this whitepaper on Digital Experience Monitoring to better understand which monitoring solution best fits your business needs. Understand the benefits associated with each and take your first steps to translating real-user data into real-world customer experience optimizations.