
This document last updated: 11/17/08 03:11pm
Splunk supports a wide variety of configurations and deployments to suit the largest scales of IT infrastructure in use worldwide. This guide will assist you in the planning and tuning of a Splunk deployment at any scale.
In addition to this document, you can find many deployment-related articles at the Splunk Deployment Wiki. See how other Splunk users have customized their deployments to meet their needs and add your own experience to the mix!
You can also watch this Splunk developer video which also provides an overview of deploying Splunk.
Splunk is simple to deploy by design. By using a single software component and easy to understand configurations, Splunk can coexist with existing infrastructure or be deployed as a universal platform for accessing IT data.
Splunk can start up and run in several different modes, each of which can serve as a component to meet your deployment requirements. This section covers these potential components:
Indexer
In this mode, indexers, or index servers, provide indexing capability for local and remote data and host the primary Splunk datastore, as well as Splunk Web.
Forwarder
Forwarders use the same Splunk software package but do not store indexed data locally. All indexed data is forwarded to remote index servers. To reduce operational footprint, Splunk Web is not used. Refer to the documentation on setting up a Splunk instance as a forwarder.
Deployment server
Both indexers and forwarders can also act as deployment servers. A deployment server distributes configuration information to running instances of Splunk via a push mechanism which is enabled through configuration. Refer to the documentation on setting up a Splunk instance as a deployment server.
Data inputsSplunk supports five primary data input types - file and directory inputs, FIFO queues, network ports, scripted inputs, and Windows event logs.
File and directory inputsSplunk can accept data input from local or mounted systems, and can read data through the use of the Splunk file input processor. The file input processor can operate in a variety of modes and is capable of reading entire files, updates to files, and real-time changes to files as well as performing those tasks on entire directory trees. Splunk supports whitelisting and blacklisting inside directory inputs for additional flexibility of configuration. Refer to the documentation about file and directory inputs for more information.
FIFO queuesCaution: Due to their vulnerability, FIFOs are not recommended. Monitor is a more reliable, stable method. Support FIFO inputs is deprecated and will be removed in a future release of Splunk.
A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.
To configure FIFO cues, see this page.
Network portsSplunk can accept data from both UDP and TCP ports. While you can use this to mimic a local system syslogd, it is equally useful for capturing any other IT data via normal network mechanisms. Like FIFO queues, network ports can offer higher indexing performance, but with similar vulnerability to data loss. Although TCP-based network communication can mitigate most data loss issues, If your deployment can tolerate absolutely no data loss, Splunk recommends that you choose files as the data input type. Refer to the documentation about network port inputs for more information.
Scripted inputsYou can configure Splunk to run an arbitrary command on any schedule, with the output being indexed by Splunk. The primary advantage of scripted inputs is that they make it possible to index almost any type of data. Examples of scripted data inputs include performance data, system and network status commands, Web requests, and SNMP traps, as well as other types of IT data. Scripted inputs can represent varied performance impact, primarily due to the number of possibilities for integration, but low-overhead scripts usually have similar performance to file data inputs. Refer to the documentation about scripted inputs for more information.
Windows event logs and WMISplunk can index Windows event logs, and by default indexes the Application, System, and Security event logs. You can configure Splunk to index other Windows event logs sources if they are present on the system, use WMI to pull data from other Windows machines, and monitor changes to your Windows Registry. Refer to the documentation about inputs for Windows, the documentation about WMI configuration, and the documentation about Windows Registry monitoring for more information.
You have many deployment options even when using a single Splunk index server. Let's see how you can use a single Splunk index server with different IT data inputs.
Splunk installed on existing aggregation host
In this deployment model, Splunk is installed on an existing aggregation host and indexes log data as it is written to disk by the local system's syslog receiver. These deployments are simple to execute, and you can easily increase their scope at a later point.
Splunk with direct network inputs
It's also simple to implement network-based data gathering with Splunk. Splunk supports multiple TCP and UDP inputs to enhance deployment flexibility.
Splunk installed on a host receiving batched IT data moves
Another way that you can deploy Splunk is with batched data moves. Remote systems copy log data after rotation intervals to a central location, where Splunk is indexing data.
Splunk indexing data on a remote mount / network storage
You can also index data on a network storage device or remote mount. Splunk indexes the data on the network storage device with all the flexibility of other configurations.
Splunk installed on all servers forwarding data
In this deployment, Splunk is installed on all systems in the topology. Deploying Splunk on a wide scale provides significant benefits to data access, change management and distribution capabilities. By installing Splunk on more systems, you can access local application logs, capture status information, monitor change on your systems, use enhanced data distribution features such as routing, cloning and balancing, and more.
Multiple index server deployment options Distributed search with data balancing
In this model, Splunk is installed on all servers in forwarding mode. Those forwarders balance their data output to Splunk indexes configured for distributed search. By federating the search execution across different indexes, total aggregate capability can be scaled in a linear fashion. If more performance is required, additional Splunk index servers can be brought on-line inside the distributed search group.
Data cloning for high availability
Through cloning data on the fly, Splunk can create exact replicas of data and forward them to multiple index servers, which enables high availability scenarios like the one depicted in the figure. You can direct search users to any index server to receive results. Some caveats apply when deploying Splunk in a highly available fashion; refer to this topic for more information before proceeding.
Data routing
Splunk's data routing capabilities implement discrete data flow control to both Splunk indexes and other locations. You can implement routing rules by message content, source, sourcetype, or host to meet a wide variety of integration requirements.
Index and search tiers for massive scalability
In this model, separate physical resources are allocated to search and index. In deployments that scale beyond hundreds of gigabytes per day or have high performance requirements for both search as well as index operations, you can allocate separate resources to these operations to improve performance and achieve greater scalability.
Short-term index tierIn the short-term index tier, Splunk forwarders are deployed to all systems in the datacenter and provide IT data and change detection information to Splunk. You can then deploy a Splunk indexer to provide search capabilities for co-located operations personnel without burdening outbound network links. A deployment server instance configured on the Splunk indexer distributes configuration to the forwarders installed to systems in the datacenter. Data retention is kept within the bounds of the indexer's local disk with all data being routed to the long-term indexing tier.
Long-term index tierIn the long-term index tier, Splunk indexers are installed into the long-term index tier to aggregate data being forwarded from the short-term index tier. The index tier allows for configurations that enable the use of all system resources to maximize indexing throughput, while storing that data on a remotely accessible file system which can be implemented using SAN or NAS technologies depending on cost and performance requirements.
Search tierSystems in the search tier host the SplunkWeb user interface for the deployment's users. The Splunk servers in this tier directly connect to the shared storage technology and fulfill search requests against the data indexed in the indexing tiers. These systems will also manage the data retention policy in place, if desired, by archiving data from the shared storage location to an alternate lower-performance storage location.
Splunk's core competency is indexing and searching any type of IT data with speed and efficiency. This versatility can present challenges to both new and seasoned users of Splunk when attempting to identify factors that can affect performance. This section reviews a variety of factors and offers suggestions on how to tune Splunk for a given deployment.
SegmentationSegmentation is how Splunk identifies items to index in your IT data that aren't key/value pairs or fields. These indexed items, or segments along with fields are the building blocks inside IT data that search capabilities are built upon. Tuning segmentation can lead to greater indexing performance by lowering the total processing required to index any line of IT data and increasing the potential for compression effectiveness..
Major and minor segmentsSplunk maintains two concepts of segments, called major and minor segments.
For example, the IP address 192.168.1.254 would be indexed entirely as a major segment and then broken up into the following minor segments: 192, 192.168, and 192.168.1.
Segmentation and data setsSegmentation impacts indexing and data storage performance directly based on the data set in use.
You can completely disable segmentation, which allows for maximum indexing performance and storage efficiency. Of course, this comes at the expense of search convenience and search speed. With segmentation disabled, you can perform searches using the regex search directive (which provides full regular expression search capabilities), search using information indexed in a search fields, or search using a combination of the two.
Note: Searches that involve regex take longer to execute due to the processing required to find regular expressions in IT data.
Splunk can automatically extract the source hosts from a given piece of IT data, which is useful in situations where data is being aggregated before arriving at Splunk to be indexed.
Timestamp ExtractionSplunk can also identify timestamps in any given piece of IT data from a variety of formats, which can not only help in pre-aggregated data cases but also with data sources that embed their timestamps in non-standard formats.
Search convenience and data storageThe combination of indexing options you select ultimately defines how convenient it is to search your IT data. Any combination of the above options is supported and can be implemented on a per source or source type basis. This lets you minimize the index overhead associated with data that is not searched frequently, while making commonly searched data more convenient for users.
A great example of how this can used to optimize a Splunk deployment would be when using Splunk for IT policy compliance. Splunk can be used to search proxy server and transaction logs for user access monitoring and user activity search, while also serving as a central repository for other types of IT data such as system logs that must be retained but may be of less interest to a compliance administrator.
In order to maintain maximum convenience and allow for saved searches to run quickly and efficiently, the maximum amount of segmentation should be applied to the proxy server and transaction logs which would be configured as discrete sourcetypes. Additional search fields may also be desired to quickly identify certain key/value pairs that may be of interest. System logs, also a discrete sourcetype, could have segmentation disabled given that they are simply being aggregated and stored to adhere to the IT control or mandate.
Hardware tuning factorsSplunk, as an application, can benefit from certain hardware configurations, maximizing performance for different aspects of the Splunk technology. This section reviews a variety of factors and offers suggestions on how to develop hardware configurations for Splunk.
Input / OutputGenerally speaking, large-scale IT search deployments present unique challenges to modern volume computing hardware available from vendors today. Many of these challenges surround I/O architectures and implementations with both hardware, software, system architecture and operatings system all playing a part in determining a given configuration's suitability for use with Splunk.
DiskSplunk is naturally demanding of the disk subsystems that it works with. Both index and search operations benefit from a disk subsystem that is designed with an eye to the types of operations that Splunk performs.
IndexIndexing is a disk I/O operation that represents a large number of small, discrete writes, paired with more small reads and writes at index optimization time. As such, large numbers of high performance disk drives in directly attached configurations with high-bandwidth interfaces are preferable when maximum index performance is required.
Measuring the number of discrete I/O operations per second is a good benchmark of how well a given disk subsystem could perform with Splunk. Most common 7200 RPM SATA disks represent about 100 IO/s, whereas 15K RPM FC, SAS, and U320 SCSI technologies can yield significantly higher performance levels, near 800 IO/s or more.
SearchSearchtime is also dominated by IO/s, especially when infrequently accessed data is in question. When searching for relatively recent data, or even pulling large (~10,000 event) chunks from greater groups of event data, an individual disk is less likely to be a bottleneck as each read call to the disk subsystem will pull larger chunks of data. In this case the storage interface will be much more critical.
However, when searching for rare terms like a name that may occur once an hour or once a day, each read call will tax an individual disk more. In these cases, using higher performance individual disks will pay massive dividends - in some cases 8x performance can be realized by using faster disks
NetworkGigabit networking is recommended for Splunk servers wherever possible. For all media types, ensure that duplex and mode are negotiated properly and use configurations to force duplex and mode if necessary to ensure predictable connectivity to the Splunk deployment.
Splunk's indexing and search technology is designed to extract maximum value from your IT data, regardless of its shape or size. Because Splunk can index so many different types of data, it's important to understand how different configurations perform with common types of IT data available in the datacenter today.
Using commodity hardware and four common IT data types, we measured indexing throughput and typical searchtime performance.
Benchmark tests Test platformSplunk 3.2 was benchmarked with the following hardware configuration:
| System | Dell PowerEdge 2950 |
| CPU | Dual Intel Xeon 5160 at 3.0GHz |
| RAM | 8GB 667Mhz DDR2 FB-DIMM |
| Disk Controller | Dell PERC5/E |
| Disk Array | Dell MD1000, 4x500GB 7200RPM SATA, RAID 5 |
| OS | Redhat Enterprise Linux AS 5.1 x86_64 |
We conducted tests in the Splunk Development labs with 'real-world' data sets. These data sets were developed using significant amounts of research into our customer use patterns and data flows, as well as the insight we've gained during our support of Interop Net in 2006 and 2007, troubleshooting the Interop show network issues in real-time.
Test data sources| Data source | Average message size |
| Network and system device syslog output | ~150 byte |
| HTTP proxy logs | ~348 byte |
| Network and system device syslog output | ~350 byte |
| J2EE Application server output | ~473 byte |
Routers, switches, firewalls, and other classes of embedded devices can generate large volumes of smaller messages, especially in centralized log management projects with hundreds of these devices in a datacenter. In these smaller messages, there are fewer terms present to index, which means you can configure Splunk in a way that achieves significantly higher rates of compression compared to other data types.
HTTP proxy logs (~348 byte)Thanks to distributed Web-based enterprise applications and the increased use of HTTP transport, HTTP proxy logs are now a more important data source than ever for monitoring user activity and reporting on IT controls for compliance purposes. Proxy logs often contain many indexable terms that Splunk uses to accelerate the search experience without employing text scanning.
Network and system device syslog output (~350 byte)Even though network and embedded devices can produce large volumes of smaller messages, systems and applications usually exhibit the opposite behavior. This means the larger message size allows Splunk to build a denser index around the data, increasing searchtime value.
J2EE application server output (~473 byte)Application server troubleshooting continues to be a primary use case for Splunk customers. We generated this dataset from a model of a running JBoss server integrated into a three-tier Web application. Like HTTP proxy logs, application server output is rich with data that lets Splunk create a high-value index, making it easy for you to pinpoint problems in real-time.
Execution parametersThe test platform was configured with typical input mechanisms for the data type being used, indexing a large volume of data overnight. We executed searches against the dataset over time and at the end of the indexing activity to ensure responsiveness and to confirm that Splunk was returning predictable result volumes on indexed data.
We evaluated each data source on the test platform individually, monitoring index throughput, events per second, compression and search time closely.
Test resultsSplunk features high performance suitable for deployment throughout any IT environment. Our test results show that Splunk delivers a desirable IT search experience and multi-megabyte per second indexing performance without compromising storage efficiency on any type of IT data.
Throughput is expressed in megabytes per second of data indexed. Compression is expressed in percentage of raw data input size. Search times are measured in seconds to retrieve rare terms in the dataset.
Note: The compression rates shown in the results table should not be used to calculate overall index size. For information about estimating your own index size, review this topic.
Results table| Data source | Index Throughput | EPS | Compression | Search Time |
| ~150B syslog | 4 MB/S | 27000 | 11% | 6 sec |
| HTTP proxy | 7 MB/S | 16000 | 54% | 2 sec |
| ~350B syslog | 3 MB/S | 12200 | 25% | 6 sec |
| J2EE | 4.75 MB/S | 9750 | 22% | 2 sec |