Documentation: 3.4.1
Print Version Contents
This page last updated: 09/16/08 05:09pm

About indexes and indexing

We use the term "index" to refer to:

  • How Splunk manipulates and prepares raw event data for searching.
  • The act of processing raw event data as Splunk prepares it for searching.
  • The directory where Splunk stores all the event data.

Splunk indexes data in real time. It accesses data using a variety of input methods, applies universal processing techniques to handle different formats of IT data, and persists the original raw data along with indexes and additional fields added during processing.

Note: Refer to the About inputs page for more information about input types and methods.

Note: Read about using Splunk Web to Manage you indexes and Create new indexes.

Events, segments, and fields

Events are a single record of activity or instance of data -- for example, a single log entry. Fields are attribute and value pairs that make up segments of events. As part of indexing, events are broken into segments; Splunk uses breaking characters and rules to define how events are divided.

Usually, Splunk can detect event boundaries for different data formats. However, if event boundary recognition is not working as desired, you can customize your rules in props.conf. Refer to the Admin Manual for how to configure event boundaries.

The are two types of segments: major and minor. Major segments are words, phrases, or terms in the data that are surrounded by breaking characters such as white space and newline characters. Minor segments are breaks within a major segment. For example, the IP address 192.168.1.254 may be indexed as a major segment and then separated into the following minor segment: 192, 192.168, and 192.168.1.

Edit your segment recognition rules in segmenters.conf and apply them to different fields via props.conf. Refer to the Admin Manual for how to configure segmentation.

Search and indexes

Splunk stores all processed data in a collection of database directories, also called an index. Each database directory is located in $SPLUNK_DB and named db_<starttime>_<endtime>_<seq_num>. $SPLUNK_DB defaults to $SPLUNK_HOME/var/lib/splunk. The following is a list of Splunk's preconfigured indexes and a brief description of what they store:

  • history: search history.
  • main: all processed data. Unless otherwise specified, this is the default database.
  • sampledata: sample event data used for training.
  • splunklogger: Splunk internal logs.
  • summary: summary indexing searches.
  • _audit: events from the file system change monitor and auditing.
  • _blocksignature: event block signatures.
  • _internal: metrics from Splunk's processors.
  • _thefishbucket: internal information on file processing.

You can create new indexes, edit index properties, remove unwanted indexes, or relocate existing indexes. You can manage (create, view, and edit) indexes from Splunk Web. For more information, refer to the User Manual's topic on managing and creating indexes. You can only remove and relocate existing indexes via the CLI. For more information, refer to the Admin Manual's topic on index management.

Unless specified, Splunk automatically searches through the default index, main. You can restrict your search to another index by specifying it in the search bar. For example, to search for HTTP requests that occurred only in sampledata:

index=sampledata httpSearch

Previous: Use crawl    |    Next: Manage your indexes

Comments

No comments have been submitted.

Log in to comment.