TIPS & TRICKS

Quantifying the Benefits of Splunk with SSDs

We’ve had the question posed to us several times over the years:  “What impact would the addition of an SSD have to my Splunk environment?”  Referencing Splunk Answers:

http://splunk-base.splunk.com/answers/10417/splunk-on-solid-state-disk

Raitz is dead-on in his reply.  As data flows into a Splunk indexer, we are write-I/O heavy.  Sequential write performance on SSD vs SAS is pretty similar so no real benefit for Splunk on an SSD here.  These benchmarks illustrate this.

RAID0 w/SSD

RAID0 w/SAS

(These are RAID controller benchmarks but they still demonstrate the point)

Since a Splunk indexing server pulls dual duty and responds to search requests as well as performs indexing, what is the impact of an SSD on search performance?  Splunk searches can be categorized in two ways, sparse and dense.  Dense reporting searches may request the average response time of a particular application over the last 24 hours for example.  Sparse searches are the “needle in a haystack” searches. A sparse search may look something like “find me this user ID in all of my data over the last year”.  For dense searching, Splunk’s I/O footprint can be characterized as a lot of sequential reads.  Referring to our benchmarks above, sequential reads on SSD are also about the same as on the SAS drives.  For sparse searching, the Splunk I/O behavior is full of random seeks. This is where Splunk shines on SSD.

 
Hardware

Three machines were used for this benchmark.  We’ve classified them by their disk speed.  CPU and memory were not identical.

7200 – 2×4 2.40GHz, 16GB, 12x2TB 7200 RPM SATA RAID 10
10k – 2×6 2.677GHz, 48GB, 4x900GB 10K RPM SAS RAID 10
15k – 2×6 2.667GHz, 12GB, 6x146GB 15K RPM SAS RAID 10
SSD – 2×4 2.40GHz, 16GB, 1x240GB (same as 7200 w PCIe SSD)

 
Load Generation

We’re using a script that runs searches against the Splunk instances above for a 5-minute period.  The searches look for a random user id that we have generated between 1 and 1 million.  We can control the number of searches executing concurrently and have tested at increasing concurrency from 1 to 32.  In a real world Splunk setup this single concurrent search workload would look similar to an individual submitting 1 search at a time, then waiting for results and submitting another search.  A test with 32 concurrent searches would look like 32 Splunk users each submitting 1 search at the same time, each waiting for a result, then each submitting another search.

 
Results

The chart below represents how many distinct searches were able to complete in a 1-minute time frame for each of these I/O setups.

 
So, for example, with 1 concurrent user, the 7200 I/O setup was able to execute 9 searches in a 1-minute span for an average search execution time of around 6.5 seconds.  This is not bad at all and helped along by a feature we released in Splunk 4.3 called bloom filters that reduces the amount of time searches take looking for rare terms:

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Bloomfilters

But holy crap, look at the SSD results!  At 32 concurrent searches we are able to complete almost 2000 searches per minute.  This is a manifestation of SSD’s having superior random read performance over a traditional hard disk drive.

 
Conclusion

As the $/GB of SSD’s continues to improve versus traditional hard disk drives, it makes sense to evaluate them for Splunk environments where you might reap order of magnitude or greater return on search thruput.  In fact you could even make the argument that since other workloads are nearly at parity and sparse searches in Splunk have such huge upside on SSD, you should consider putting your hot and warm Splunk indices on SSD with cold perhaps on spinning media.  I’m not saying that there aren’t other factors you should weigh when deploying enterprise SSDs but with performance like this, it should definitely be on your radar.

Thanks to Sunny Choi for running the benchmarks and helping to interpret the results.

Patrick Ogdin
Posted by

Patrick Ogdin