As we at Splunk accelerate our cloud journey, we’re often faced with the decision of when to use logs vs metrics — a decision many in IT face. On the surface, one can do a lot by just observing logs and events. In fact, in the early days of Splunk Cloud, this is exactly how we observed everything. As we continue to grow, however, we find ourselves using a combination of both.
This post lays out the overall difference in logs and metrics and when to best utilize each. We hope that this analysis will help you create a better observability strategy for your own organization.
(Compare monitoring & observability.)
What are logs?
Almost all programs emit activities that occur within their program flow in the form of a log. These logs are generally files that can be:
- Unstructured text descriptions of program execution with a timestamp
- Structured (json, xml) events of the program execution
Either way, the logs are written to a file that can then be consumed by a logs search engine. The search engine collects all the logs and then presents results of various searches to the user.
Benefits & use cases for logs
Logs are emitted from almost every program. A good logs search engine is able to handle any type of log. That makes logs the easiest and quickest data source to get visibility into the state of your system.
Within a mature observability strategy, logs are essential for unplanned research and unique situations. They are great for security use cases because many of these involve the unexpected or single event situations. For example, Splunk’s security organization utilizes logs to quickly detect and remediate significant vulnerabilities, including the log4j vulnerability disclosed in late-2021.
Logs are also great for iterative software delivery because they allow developers to establish patterns for new behaviors or functionality in production, which accelerates delivering value to customers.
(Read our log management introduction.)
Challenges with logs
It might be tempting to think that logs can solve every use case. As the amount of data grows, however, a logs-only solution will become costly and relatively slow for a small set of regular searches, usually connected to alerts. This is because the process by which logs must be categorized and batched takes much more time and is much more computationally intensive than the metrics process, which we will cover next.
What are metrics?
A metric is a number, usually in the form of a counter or a gauge, that the developers decide is important to the observability of their system. Most software programs start their journey emitting logs, but only in the last decade have they also begun emitting metrics from early in their inception.
We are used to metrics in our everyday life. We see them on our speedometer with our car (a gauge) or the odometer tracking how many miles the car has driven (a counter). The makers of our car decided that it was important for the driver to have awareness of this information while driving.
(Popular discipline-based metrics include monitoring metrics and DORA metrics for DevOps.)
Benefits and challenges of metrics
For developers, often the biggest challenge to incorporating metrics is twofold:
- Taking the time to determine the right metrics for their systems.
- Emitting those metrics by adding program logic that will share the metrics with another system.
When done correctly, metrics are essential for planned scenarios and events. They deliver regular evaluations cheaply, quickly and reliably. This is because they are structured in a way, unlike with logs, that is predictable and therefore can be saved into a time-series database, which is tuned for this purpose. Operators are then able to quickly know where to start when investigating a degraded state in their systems.
However, the organization must remember that the source of reliability is not found in identifying all alerts and degraded states with metrics. With modern, iterative software delivery, one must be able to debug and investigate unplanned states and rapidly incorporate those observations into the product lifecycle. This is why metrics do play a critical — though not the only — role, in delivering observable, reliable IT services.
Example of metrics and logs, together
Metrics state the big picture of what is happening. If I’m driving the car, I can see the temperature of the motor and whether the coolant warning light is on (the metrics). However, if the car starts behaving outside of the norm, a mechanic might need to ask some unpredictable questions and see the actual event log of the car itself.
This is where logs and metrics differ, and we can summarize as follows:
- Metrics work better for regularly knowing predictable states of the system, for example with alerts.
- Logs work better for researching unpredictable states of the system, for example when you need to deeply investigate an incident.
(Read more about machine data.)
Choosing logs or metrics?
So, when we’re asked to weigh in on the logs vs metrics debate, we say both! Logs and metrics together create a complementary observability foundation, upon which we operate the Splunk Cloud Platform. Once we establish that foundation, teams will also want to connect parts of their system together with a tracing solution.
With these three elements, we have all the essential elements to view and connect the system in both predictable and unpredictable ways. Splunk products provide a world class implementation of these benefits, that we ourselves use every day. We hope that our experience will help you in your journey to solve problems easier and faster with data.
- How to Build a Culture of Observability
- Data Observability: How Observability Improves Data Workflows
- Splunk Observability Product Suite
- The State of Observability Today
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.