Documentation: 3.2.1
Print Version Contents
This page last updated: 08/26/08 12:08pm

Splunk benchmarks

Benchmark test overview

Splunk's indexing and search technology is designed to extract maximum value from your IT data, regardless of its shape or size. Because Splunk can index so many different types of data, it's important to understand how different configurations perform with common types of IT data available in the datacenter today.

Using commodity hardware and four common IT data types, we measured indexing throughput and typical searchtime performance.

Benchmark tests

Test platform

Splunk 3.2 was benchmarked with the following hardware configuration:

System Dell PowerEdge 2950
CPU Dual Intel Xeon 5160 at 3.0GHz
RAM 8GB 667Mhz DDR2 FB-DIMM
Disk Controller Dell PERC5/E
Disk Array Dell MD1000, 4x500GB 7200RPM SATA, RAID 5
OS Redhat Enterprise Linux AS 5.1 x86_64

We conducted tests in the Splunk Development labs with 'real-world' data sets. These data sets were developed using significant amounts of research into our customer use patterns and data flows, as well as the insight we've gained during our support of Interop Net in 2006 and 2007, troubleshooting the Interop show network issues in real-time.

Test data sources

Data source Average message size
Network and system device syslog output ~150 byte
HTTP proxy logs ~348 byte
Network and system device syslog output ~350 byte
J2EE Application server output ~473 byte

Network and system device syslog output (~150 byte)

Routers, switches, firewalls, and other classes of embedded devices can generate large volumes of smaller messages, especially in centralized log management projects with hundreds of these devices in a datacenter. In these smaller messages, there are fewer terms present to index, which means you can configure Splunk in a way that achieves significantly higher rates of compression compared to other data types.

HTTP proxy logs (~348 byte)

Thanks to distributed Web-based enterprise applications and the increased use of HTTP transport, HTTP proxy logs are now a more important data source than ever for monitoring user activity and reporting on IT controls for compliance purposes. Proxy logs often contain many indexable terms that Splunk uses to accelerate the search experience without employing text scanning.

Network and system device syslog output (~350 byte)

Even though network and embedded devices can produce large volumes of smaller messages, systems and applications usually exhibit the opposite behavior. This means the larger message size allows Splunk to build a denser index around the data, increasing searchtime value.

J2EE application server output (~473 byte)

Application server troubleshooting continues to be a primary use case for Splunk customers. We generated this dataset from a model of a running JBoss server integrated into a three-tier Web application. Like HTTP proxy logs, application server output is rich with data that lets Splunk create a high-value index, making it easy for you to pinpoint problems in real-time.

Execution parameters

The test platform was configured with typical input mechanisms for the data type being used, indexing a large volume of data overnight. We executed searches against the dataset over time and at the end of the indexing activity to ensure responsiveness and to confirm that Splunk was returning predictable result volumes on indexed data.

We evaluated each data source on the test platform individually, monitoring index throughput, events per second, compression and search time closely.

Test results

Splunk features high performance suitable for deployment throughout any IT environment. Our test results show that Splunk delivers a desirable IT search experience and multi-megabyte per second indexing performance without compromising storage efficiency on any type of IT data.

Throughput is expressed in megabytes per second of data indexed. Compression is expressed in percentage of raw data input size. Search times are measured in seconds to retrieve rare terms in the dataset.

Note: The compression rates shown in the results table should not be used to calculate overall index size. For information about estimating your own index size, review this topic.

Results table

Data source Index Throughput EPS Compression Search Time
~150B syslog 4 MB/S 27000 11% 6 sec
HTTP proxy 7 MB/S 16000 54% 2 sec
~350B syslog 3 MB/S 12200 25% 6 sec
J2EE 4.75 MB/S 9750 22% 2 sec
Previous: Hardware tuning factors    |    End

Comments

No comments have been submitted.

Log in to comment.