Documentation: 3.3.2
Print Version Contents
This page last updated: 03/21/08 11:03am

Hardware tuning factors

Splunk, as an application, can benefit from certain hardware configurations, maximizing performance for different aspects of the Splunk technology. This section reviews a variety of factors and offers suggestions on how to develop hardware configurations for Splunk.

Input / Output

Generally speaking, large-scale IT search deployments present unique challenges to modern volume computing hardware available from vendors today. Many of these challenges surround I/O architectures and implementations with both hardware, software, system architecture and operatings system all playing a part in determining a given configuration's suitability for use with Splunk.

Disk

Splunk is naturally demanding of the disk subsystems that it works with. Both index and search operations benefit from a disk subsystem that is designed with an eye to the types of operations that Splunk performs.

Index

Indexing is a disk I/O operation that represents a large number of small, discrete writes, paired with more small reads and writes at index optimization time. As such, large numbers of high performance disk drives in directly attached configurations with high-bandwidth interfaces are preferable when maximum index performance is required.

Measuring the number of discrete I/O operations per second is a good benchmark of how well a given disk subsystem could perform with Splunk. Most common 7200 RPM SATA disks represent about 100 IO/s, whereas 15K RPM FC, SAS, and U320 SCSI technologies can yield significantly higher performance levels, near 800 IO/s or more.

Search

Searchtime is also dominated by IO/s, especially when infrequently accessed data is in question. When searching for relatively recent data, or even pulling large (~10,000 event) chunks from greater groups of event data, an individual disk is less likely to be a bottleneck as each read call to the disk subsystem will pull larger chunks of data. In this case the storage interface will be much more critical.

However, when searching for rare terms like a name that may occur once an hour or once a day, each read call will tax an individual disk more. In these cases, using higher performance individual disks will pay massive dividends - in some cases 8x performance can be realized by using faster disks

Network

Gigabit networking is recommended for Splunk servers wherever possible. For all media types, ensure that duplex and mode are negotiated properly and use configurations to force duplex and mode if necessary to ensure predictable connectivity to the Splunk deployment.

Previous: Splunk tuning factors    |    Next: Splunk benchmarks

Comments

No comments have been submitted.

Log in to comment.