Since Splunk launched the first IT search engine in August, early adopters have told us how great search is for their production IT data. Logs are the most obvious data to search, but there’s also system configuration files, monitoring data, and anything else you can index. I’ve never seen so much excitement about a new approach to IT from a little startup (I’ve been in and around software startups for 13 years, in systems management since ’97, and in and around the log niche since ’01.) We’ve already had thousands of downloads and dozens of press articles.
Not surprisingly, a number of vendors that make other kinds of tools for log data have jumped onto our search bandwagon. That little word “search” and self-comparisons to Google pop up with greater frequency in the marketing materials, press stories and analyst coverage about log appliances. But they’re NOT search engines.
Log appliances and log management software do other things – monitoring, reporting, long term retention. Some of them do those things very well. But they simply do not provide the easy, instantaneous search of everything on your network.
What makes Splunk a search engine?
Splunk is a search engine because it builds an index (Wikipedia has a good definition) of every unique segment of every log event it processes. Splunk also keeps a pointer to each of the original events from which it built the index. When you type in search keywords Splunk searches the entire index almost instantly – a few seconds at most – and displays only those that match. You can modify your search or run a completely different one just as quickly.
By contrast if you use grep, less, or other Unix shell commands to search your data, the computer scans the original raw data by reading files line by line. There’s no index; it starts over from scratch every time. The computer is forced to read files from disk through limited disk I/O, fill up memory and swap it to disk, tie up CPU to match characters … whew! It’s fine if you’re just looking through one file, but if you’ve ever started a grep command across an archive of 50 gigabytes – or maybe 500 gigs – and then left for the weekend to let it run, you know what I mean. We colloquially refer to using grep as running a “search,” but it’s not search in the search engine sense of the word – there’s no index. Imagine if every time you typed “housing bubble” into Google their servers had to walk through every page on the Web and every post to Usenet all over again while you sat waiting. The power of search engines isn’t that they’re faster than you at reading data, it’s that they’ve already indexed it so they can look it up for you quickly.
Database reports aren’t search, either. They parse and map every event into a predetermined list of standard fields, build an index on specific fields a database admin has configured, then force you to form a structured query using only those indexed fields. To find something you must already have a pretty good idea of what and where it is. It’s like Yahoo’s early days when they thought they’d be able to catalog every site on the Web into an orderly list.
Splunk does a lot of other cool things beyond its index. It classifies events based on analysis of their structure, normalizes timestamps for all events (and calculates a timestamp for those events missing one), and finds associative data relationships. It puts an AJAX UI on top of the whole thing that anticipates your next move. These other things are important but Splunk’s full-text index – hard to design and build, easy to use – is why Splunk is a search engine and run-of-the-mill log consolidators are not.
So what are the log consolidators if they’re not search engines?
Most log consolidators are combinations of off-the-shelf GUI reporting, rule-based monitoring & alerting, relational databases, and compressed long-term file storage. A relational database is a significant performance and scalability bottleneck when data gets too big, so older data is just stored as files. The reporting, alerting, and database features all depend on libraries of custom parsing and mapping algorithims for each of dozens of log formats for different operating systems, firewalls, IDS, network devices, and security applications.
What log consolidators refer to loosely as “search” is one of three features:
– Full text scanning of raw text files – essentially grep with a GUI. It’s painfully slow.
– Structured query against very recent events stored in SQL databases , where their GUI is simply front-ending SQL.
– Structured query against indexes of a handful of key fields they’ve built to point back to locations in raw historical text files.
The last two are a perfectly acceptable approach for canned reports on well-known and consistent log formats, for example Checkpoint firewall data. If you only need to produce a report of top external IPs hitting your firewalls every Monday morning, they’ll do the job perfectly well.
But what if you have 50 different log sources across web servers, application servers, and databases that you need to search on an ad hoc basis? What if you need to figure out why a key transaction failed for a few seemingly unrelated customers? If the answers aren’t in your canned reports and pre-defined data fields you’ll have to manually grep a homegrown central log file repository. This mode of ad hoc access to logs costs systems administrators the most time. It’s here that Splunk shines.
The other big difference
I’ve focused on the architectural distinction between search and structured query or scanning text in this post. But there’s another other important difference between Splunk and the log consolidators.
The log consolidators are most interested in network security and security audit compliance – not in application and service level troubleshooting. As a result they only handle a fraction of the many types of data from IT systems. They need to provide separate parsers for each format. You can’t just repurpose these for other log formats.
Here at Splunk our universal event processing lets us provide the search capability for any IT data source, not just a few that we’ve focused on. We’re learning that our customers want to troubleshoot across web servers, J2EE application servers, Java exception stack traces, php errors, VoIP call records and email MTA logs – sometimes all at once! Instead of just a few well-known data types, Splunk indexes everything. If it was on your network, Splunk will find it.