It has been a while since anyone has written a direct blog entry on event correlation here at Splunk so I thought I would write one today. Event correlation can loosely be defined as a technique to relate any number of events with some identifiable patterns (and optionally act upon the relationship). Security vendors may narrowly claim that event correlation is the ability to correlate security related events and alert upon their existence. This is a subset of what event correlation can be. For instance, in a hypothetical case, I can correlate that if it rains on a major Monday holiday, end of day total sales are lower than average sales for a brick and mortar retail shop. This case would have nothing to do with security, but it is a form of event correlation, that can be performed in Splunk as soon as the data is indexed. In fact, I would ascertain, that event correlation is an important aspect for use cases that not only involve security, but also, fraud detection, data intelligence, root cause analysis, operations support, and general mean time to resolution.
With Splunk, because of Universal indexing of all the search terms in your data and search time field extraction capabilities, event correlation becomes a natural feature for it. There are multiple ways to achieve different types of event correlation within Splunk. What I will do is provide a non-exhaustive list of some of the methods that can be used to accomplish this.
Manual Event Correlation
Every time a Splunk user performs an ad-hoc search and pivots on results to find what else happened in the same time line, he or she is manually performing event correlation with time being the universal pattern to relate events. For instance, the user can use the Splunk time picker to narrow down a time and then type something as general as “error” into the search bar to search.
After receiving results, the user can then use the histogram to zoom in on a particular event’s time line and then use * as a search term to see what else happened at that particular frame in time. Events are correlated by search using time as the pivot. This is what I call manual event correlation, which is just as important as automatic event correlation, for troubleshooting. In what follows, I will discuss the various ways Splunk can be used to automate different types of event correlation.
Splunk has created “Transaction Search.” What this means is that if events have similar values for extracted fields or starting/ending terms, Splunk can automatically correlate these events as a result of a search and group the returned results. Rather than repeat what has already been said about transaction search, I encourage you to read this blog entry by Maverick for an in-depth example. You can also see my SOA article for a real world use case on using transaction search to correlate event activity across application tiers.
On the other hand, as to not to have you leave this post, I’ll provide a small example on using transaction search. On Splunkbase, you can download an app that indexes on a daily basis the world’s most malicious IP source addresses, according to one source. Here’s a gratuitous screenshot.
Needless to say, you would want to know if any of these IP addresses appeared as source IP’s in your own logs. An example search such the as one below would group events that included the offending IP addresses in your own events.
sourcetype="ip_watchlist" OR (sourcetype="sshd" login accepted)|transaction offending_ip,src_ip maxspan=1d connected=f|eval count_sourcetypes=mvcount(sourcetype)|where count_sourcetypes>1
What this search does is say if someone has logged in using SSH and their source IP is one that is in the list of malicious IP addresses (the transaction command does the grouping) within a day’s span, and the number of sourcetypes in the grouping is greater than one so we know both sourcetypes were in in the grouping, then return results. You can save this type of search, schedule it to run on an interval, and then have Splunk automatically create an alert to notify you, if events are matched.
A variant of transaction search is statistical aggregation, where numerical aggregations of different fields are grouped by other fields. Here’s a simple example usage using mail logs that counts the number of bytes coming into each relay.
sourcetype=email|stats count(bytes) as byte_count by relay|sort - byte_count
There are times where you would rather use the Splunk stats command over the transaction command and this is described here.
Another way to automate event correlation is to use the concept of a sub search. If you like the approach of an outer join in database terms, note that Splunk can perform sub searches to narrow down the criteria for one event and then perform another search on the first set of results. Again, as to not repeat what has already been written, here’s an article describing this feature for Event Correlation.
Although a related feature is not used as much, If you would like to join events stored with Splunk itself, Splunk does have a join search command. There may be use cases where performing the logical union of related events is necessary.
Splunk has the capability to correlate with data that is external to Splunk using the lookup command. The most basic use for this is when you have some fields that are in your Splunk event that need to correlate to fields in an external CSV file. At search time, Splunk will perform the look up and introduce new fields from the external CSV file as patterns are matched. In essence, this enriches your existing indexed data with external sources at search time.
If you would like to correlate events with the same field value between external database tables and events within Splunk, Splunk’s lookup command can also be used to accomplish this. The basic idea is that a user written program will be called to perform the SQL to bring in new fields at search time. This is called a dynamic lookup as opposed to the static lookup accomplished by using a CSV file. In fact, because the program is developed by the user to perform the external lookup, you are not limited to CSV files or databases to perform lookups. Anything can be correlated using the lookup command if you have programmatic access to it. Examples could include a call to a web service, an external DNS lookup, or calling whois for an IP address.
In this entry, I have provided a non-exhaustive list of the most common ways event correlations are performed in Splunk. There are more subtle ways to accomplish this, but since this is a blog entry, I decided to start with the most prevalent usages. The main point I want to leave you with is that event correlation and subsequent alerting can be performed on any type of events that you choose to index giving you a powerful technique to analyze, aggrandize, and interpret data.