< Back to the Best Practices area
Splunk can benefit from certain hardware configurations, maximizing performance for different aspects of the Splunk technology. This topic reviews a variety of factors and offers suggestions on how to size your hardware for Splunk.
Generally speaking, large-scale IT search deployments present unique challenges to modern volume computing hardware available from vendors today. Many of these challenges surround I/O architectures and implementations with both hardware, software, system architecture, and operating system all playing a part in determining a given configuration's suitability for use with Splunk. Your mileage will vary with the guidelines below. Please contact Splunk for more specific recommendations in your environment.
Some high level guidelines (for Splunk version 4.0 and later):
Splunk is naturally demanding of the disk subsystems that it works with. Both index and search operations benefit from a disk subsystem that is designed with an eye to the types of operations that Splunk performs.
In Splunk, indexed data can be located on different partitions and still be searchable. If you do use seperate partitions, the most common way to arrange Splunk's datastore is to keep the more recent data on the local machine (with disks that read and write fast), and to keep older data on a separate disk array (with slower but more reliable disks for longer term storage). Here is a link to the documentation for more information
Measuring the number of discrete I/O operations per second is a good benchmark of how well a given disk subsystem could perform with Splunk. Most common 7200 RPM SATA disks represent about 100 IO/s, whereas 15K RPM FC, SAS, and U320 SCSI technologies can yield significantly higher performance levels, near 800 IO/s or more. To perform a benchmark you can use bonnie++, freely available at http://www.coker.com.au/bonnie++/. It needs to be compiled on the target system. Once complied, you can run the following command for each volume you want to benchmark:
$ bonnie++ -d [/your volume] -s [twice your system RAM in GB] -u root:root -qfb
Indexing is a disk I/O operation that represents a large number of small, discrete writes, paired with more small reads and writes at index optimization time. As such, large numbers of high performance disk drives in directly attached configurations with high-bandwidth interfaces are preferable when maximum index performance is required.
Each search will run in a separate process, so you will benefit from additional CPUs for each concurrent search.
Searchtime is also dominated by IO/s, especially when infrequently accessed data is in question. When searching for relatively recent data, or even pulling large (~10,000 event) chunks from greater groups of event data, an individual disk is less likely to be a bottleneck as each read call to the disk subsystem will pull larger chunks of data. In this case the storage interface will be much more critical.
However, when searching for rare terms like a name that may occur once an hour or once a day, each read call will tax an individual disk more. In these cases, using higher performance individual disks will pay massive dividends - in some cases 8x performance can be realized by using faster disks
Gigabit networking is recommended for Splunk servers wherever possible. For all media types, ensure that duplex and mode are negotiated properly and use configurations to force duplex and mode if necessary to ensure predictable connectivity to the Splunk deployment.