All data that comes into Splunk is indexed through the universal pipeline. Data enters the universal pipeline as large (10,000 bytes) chunks. As part of pipeline processing, these chunks are broken into events. Initially, newline characters signal an event boundary. In the next stage of processing, Splunk applies line merging rules specified in props.conf.
As part of indexing, events are broken into sections called segments. Splunk uses a list of breaking characters and other rules (such as the maximum number of characters per segment) that are configurable through segmenters.conf.
How events workEvents are a single record of activity within a log file. An event typically includes a timestamp (for more information about timestamp configuration, read how timestamps work). Events also provide information about the system that Splunk is monitoring.
Here's a sample event:
172.26.34.223 - - [01/Jul/2005:12:05:27 -0700] "GET /trade/app?action=logout HTTP/1.1" 200 2953
Event or event typeEvents differ from event types. Event types are a classification system and can be made up of any number of events. Events are single instances of data -- a single log entry, for example.
Change Splunk's default line-breaking behavior in multi-line events. Learn more here.
Note: Before manually modifying any configuration file, read about bundle files.
Lines over 10,000 bytesSplunk breaks lines over 10,000 bytes into multiple lines of 10,000 bytes each when indexing them. It appends the field meta::truncated to the end of each truncated section. However, Splunk still groups these lines into a single event.
Events over 100,000 bytesSegments after the first 100,000 bytes of a very long line are searchable, but Splunk does not display them in search results. It only displays the first 100,000 bytes.
Events over 1,000 segmentsSplunk only displays the first 1,000 individual segments of an event as segments separated by whitespace and highlighted on mouseover. It displays the rest of the event as raw text without interactive formatting.
How segmentation worksThere are two types of segments; major and minor. Major segments are words, phrases or terms in your data that are surrounded by breaking characters -- such as a blank space. By default, major breakers are set to most characters and blank spaces.
Minor segments are breaks within a major segment. For example, the IP address 192.168.1.254 is indexed entirely as a major segment and then broken up into the following minor segments: 192, 192.168, and 192.168.1.
Splunk stores each minor segment in addition to each major segment. Therefore, enabling more minor breakers generally increases index size. However, minor segments provide more flexibility when searching in Splunk Web. With minor breakers enabled, you can search for a term you know is part of a minor segment without using a wildcard. For example, with "." set as a minor breaker, the search "10.2" will return the same as the search "10.2*". Minor breakers also allow you to drag and select parts of search terms from within Splunk Web. Use segmentation configurations to reduce both indexing density and the time it takes to index by changing minor breakers to major.
To configure segmentation, first decide what type of segmentation works best for your data. Then, use segmenters.conf to create segmentation rules. Finally, tie your custom segmentation rules to a host, source or sourcetype via props.conf.
Comments
No comments have been submitted.