In Observability, RED is the New Black

When it comes to complex application integrations, RED monitoring provides a sensible and necessary common element to see how our systems are performing and to alert us to behavior which is detrimental to your customers and your business goals.

So, what is RED? RED stands for rate, errors, duration and is an offshoot of the Google Golden Signals. RED was originally aligned with microservices, but it applies well to any request-driven application approach, including e-commerce and incorporates serverless functions as well. By giving us a solid, standardized starting point, RED makes it possible for separate teams to exchange clear information on concerns within the system, yet allows for expansion to cover unique needs and powers the drill down needed for cause analysis.

Let’s break RED down to understand each part.

Rate is the number of requests per time unit being handled. These can be handled via HTTP, SOAP, REST or middleware messaging. It can include information from control structures like service meshes or even API calls. You can think of rate as the use of bandwidth, which means any application environment that can fail on peak traffic is a target for rate monitoring. Rate is most often a metric approach, highlighting the behavior of a system.

Errors are problems that can cause an incorrect, incomplete or unexpected result. Did it return bad information? Did it fail to come back? Errors have many potential sources, from unexpected code issues (bugs), configuration or deployment issues or even unexpected behavior as a result of scale. Errors have the biggest impact on our applications and while they are often the easiest to spot, they tend to be the most difficult to resolve. In our instant-gratification world, errors require rapid response and usually point-specific corrections. And they often need a deep dive into application-specific details, usually into the application log files.

Duration is all about time. Specifically, the time for each request to complete, usually aggregated up to a percentage indicator. While not completely required, duration generally falls into the realm of distributed tracing, like OpenTracing and OpenTelemetry. Distributed tracing tracks the path and time your requests take between and within services, and brings events into causal order. Duration answers such questions as to when and how long, helping to identify the location or microservice that is exhibiting undesired timing behavior.It’s easy to get started with RED. Tools like SignalFx Microservices APM make it simple to get the right data into the right visual presentation to give you the data you need.

All of the tools in RED start with instrumentation. SignalFx Microservices APM and its distributed tracing are designed to be instrumentation-agnostic, supporting Zipkin, OpenTracing, OpenCensus and OpenTelemetry usage. It also provides a number of automatic instrumentation libraries for popular languages, like Java, Scala and Kotlin, Python, Golang and many more. It also provides for ingesting from the increasingly common use of service meshes like Istio, Envoy and LinkerD.

And since SignalFx Microservices APM autodetects and captures your instrumented data, it provides a rapid startup, including end-to-end service maps and dashboards for those needed RED metrics so you can monitor the health of your systems.

Once you are set up, you can find out problems, look at alerting and responses to keep your applications performing and even do deep dives to understand the underlying causes of issues.  A good example of finding the root cause is outlined in the SignalFx Microservices APM documentation.

So, what can RED do for you? Besides being an easy to remember acronym, RED tends to reduce decision fatigue in deciding how to get started observing your microservices applications. Its simplicity and clarity make the learning curve short. And it gives all of the teams, both operational and development, a common vocabulary to discuss issues and resolutions.

RED can be extended to build specifics for your unique needs based on your unique usage. And by tracking the path, duration and success of their requests, RED can serve as a proxy for user happiness.

So RED wins for microservices (or any request-driven) applications, crossing clouds, serverless functions and working with orchestration, service meshes and containers. It’s easy to get started with RED, in fact, you can try out the SignalFx product for free.

After all, the right tool will give you the observability you and your teams need. Find out more here about Observability and what it means for you with Splunk.

Dave McAllister

Posted by