Topics

| pdf version

Use Splunk's command line interface (CLI)


Splunk > The IT Search Company

  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk

Localized Splunk documentation

Looking for Splunk documentation in other languages?

How indexing works

This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6

How indexing works

Indexing is how Splunk processes the data you send it. Splunk can index any kind of time-series data (data that has timestamps). When data is indexed, it is broken into events based on its timestamps.

All data that comes into Splunk is indexed through the universal pipeline. Data enters the universal pipeline as large (10,000 bytes) chunks. As part of pipeline processing, these chunks are broken into events. Initially, newline characters signal an event boundary. In the next stage of processing, Splunk applies line merging rules specified in props.conf.

As part of indexing, events are broken into sections called segments. Splunk uses a list of breaking characters and other rules (such as the maximum number of characters per segment) that are configurable through segmenters.conf.

Indexing is an I/O-intensive process. If you're building a system to index a lot of data, Splunk recommends you take this into consideration.

Image:HowIndexWorksdiagram.png

The splunk-optimize process

While Splunk is indexing data, one or more instances of the splunk-optimize process will run intermittently, merging index files together to optimize performance when searching the data. The splunk-optimize process can use a significant amount of cpu, but should not consume it indefinitely, only for a short amounts of time. You can alter the number of concurrent instances of splunk-optimize by changing the value set for maxConcurrentOptimizes in indexes.conf, but this is not typically necessary.

splunk-optimize should only run on db-hot.
You can run it on warm DB's manually if you find one with a larger number of .tsidx files (more than 25) - ./splunk-optimize <directory>
If splunk-optimize does not run often enough, search efficiency will be affected.

What's in an index?

Splunk stores all processed data in indexes. Indexes, in turn, are stored in databases, which are located in $SPLUNK_HOME/var/lib/splunk. A database is a directory named db_<starttime>_<endtime>_<seq_num>. An index is a collection of database directories.

Splunk comes with preconfigured indexes:

  • main: the default Splunk index. All processed data is stored here unless otherwise specified.
  • splunklogger: Splunk keeps track of its internal logs in this index.
  • _internal: this index includes metrics from Splunk's processors.
  • sampledata: a small amount of sample data is stored here for training purposes.
  • _thefishbucket: internal information on file processing.
  • _audit: events from the file system change monitor, auditing, and all user search history.

Read About managing indexes in this manual for more information.

Revision: 207 | Contact | Privacy Policy | Terms of Use | Community content licensed under Creative Commons