As part of indexing, events are broken into sections called segments. Splunk uses a list of breaking characters and other rules, such as the maximum number of characters, that are configurable through the segmenters.conf file.
There are two types of segments; major and minor. Major segments are words, phrases or terms in your data that are surrounded by breaking characters -- such as a blank space. By default, major breakers are set to most characters and blank spaces.
Minor segments are breaks within a major segment. For example, the IP address 192.168.1.254 would be indexed entirely as a major segment and then broken up into the following minor segments: 192, 192.168, and 192.168.1.
Splunk stores each minor segment in addition to each major segment. Therefore, enabling more minor breakers will generally increase index size. However, minor segments provide for more flexibility when searching SplunkWeb. With minor breakers enabled, you can search for a term you know is part of a minor segment without using a wildcard. For example, with "." set as a minor breaker, the search "10.2" will return the same as the search "10.2*". Minor breakers also allow you to drag and select parts of search terms from within the UI. You can use segmentation configurations to reduce both indexing density and the time it takes to index by changing minor breakers to major.
Configuration files for segmentationYou can configure major and minor breakers via segmenters.conf. You can also configure how many characters of each event get indexed.
Comments
examples have been posted.
Posted by emma on Oct 19 2007, 1:58pm
We should add an examples page or topics for common use cases, such as improving storage efficiency by converting minor to major breakers; eliminating term indexing altogether; adding or removing breaking characters for unusual data formats, etc.
Posted by cfrln on Aug 11 2007, 10:31am