Documentation: 3.3
Print Version Contents
This page last updated: 05/14/08 03:05pm

Indexing performance

Splunk's indexing performance can be maximized by tweaking settings in Splunk's configuration files. Here are some basic tweaks you can implement to improve indexing performance:

  • Change Splunk's time stamp extraction settings in props.conf :
    • Set Splunk to look fewer characters into an event for a time stamp, (or turn off time stamp extraction).
    • Use strptime formatting for timestamps (%d/%m/%Y %H:%M:%S).
  • Edit Splunk's aggregator function to turn off line merging.
  • Reduce segmentation of events by altering the MAJOR and MINOR breakers.
  • Turn off some of Splunk's advanced features.

Negative impact on indexing performance

  • The more regexes you configure in transforms.conf, the longer indexing takes. Make sure all of your regexes are necessary.
  • Custom processing.
  • Using many fields extracted during indexing (see indexed fields).
  • Using your own C/C++ modules.

Processors

Splunk has several internal processors. If you notice that Splunk isn't indexing your data as you like, you can track down exactly which processor is responsible for the delay by running the following search:

index::_internal NOT sendout group=pipeline | timechart sum(cpu_seconds) by processorSearch

This search shows you a chart of Splunk's internal processors. If one processor in particular is taking up more cpu time than another, you can tweak settings to reduce this.

Below are some tuning parameters in Splunk's configuration files that affect indexing performance.

indexes.conf

indexes.conf controls how Splunk's indexes are configured. You can change the following entries to improve indexing performance.

indexThreads = <non-negative number> (0) The number of extra threads to use for a specific index. Turning up the number of index threads will improve indexing, but is dependent on the capability of your hardware. It is not recommended to turn up index threads to be greater than the number of processors in the server that this instance is running on. For example, a single core system should never be set to higher than 1
maxMemMB = <non-negative number> (50) Amount of memory to allocate for indexing. This amount will be allocated per index thread. For example, if you have indexThreads set to 2 and maxMemMB set to 300, you will be using 600 MB of memory
maxDataSize = <non-negative number> (750) Max amount of data in MBs db hot can grow to. Values larger than the default are not recommended unless you have a 64-bit system.

props.conf

props.conf controls what parameters apply to events during indexing based on settings tied to each event's source, host, or sourcetype.

DATETIME_CONFIG = <filename relative to Splunk_HOME> (/etc/datetime.xml) Specifies the file to configure the timestamp extractor. This configuration may also be set to "NONE" to prevent the timestamp extractor from running or "CURRENT" to assign the current system time to each event.
TIME_FORMAT = <strptime-style format> (empty) Specifies a strptime format to extract the date. Specifying a strptime format for date extraction accelerates event indexing.
MAX_TIMESTAMP_LOOKAHEAD = <integer> (150) Specifies how far into an event Splunk should look for a timestamp. If you know your timestamp is in the first n characters of the event, set this to n. This will increase the speed of indexing.

segmenters.conf

segmenters.conf defines schemes for how events will be tokenized in Splunk's index.

MAJOR = <space separated list of strings> Move MINOR breakers into the MAJOR breaker list, or remove breakers in the MAJOR breaker list to change the size and amount of raw data events.
MINOR = <space separated list of strings> Remove the MINOR= string of characters that represent tokens to index by in addition to the MAJOR breaker list. Reduce or remove this list to increase indexing performance.

Read more about how to configure custom segmentation.

Previous: Performance tuning Splunk    |    Next: Search performance

Comments

No comments have been submitted.

Log in to comment.