Splunk's core competency is indexing and searching any type of IT data with speed and efficiency. This versatility can present challenges to both new and seasoned users of Splunk when attempting to identify factors that can affect performance. This section reviews a variety of factors and offers suggestions on how to tune Splunk for a given deployment.
SegmentationSegmentation is how Splunk identifies items to index in your IT data that aren't key/value pairs or fields. These indexed items, or segments along with fields are the building blocks inside IT data that search capabilities are built upon. Tuning segmentation can lead to greater indexing performance by lowering the total processing required to index any line of IT data and increasing the potential for compression effectiveness..
Major and minor segmentsSplunk maintains two concepts of segments, called major and minor segments.
For example, the IP address 192.168.1.254 would be indexed entirely as a major segment and then broken up into the following minor segments: 192, 192.168, and 192.168.1.
Segmentation and data setsSegmentation impacts indexing and data storage performance directly based on the data set in use.
You can completely disable segmentation, which allows for maximum indexing performance and storage efficiency. Of course, this comes at the expense of search convenience and search speed. With segmentation disabled, you can perform searches using the regex search directive (which provides full regular expression search capabilities), search using information indexed in a search fields, or search using a combination of the two.
Note: Searches that involve regex take longer to execute due to the processing required to find regular expressions in IT data.
Splunk can automatically extract the source hosts from a given piece of IT data, which is useful in situations where data is being aggregated before arriving at Splunk to be indexed.
Timestamp ExtractionSplunk can also identify timestamps in any given piece of IT data from a variety of formats, which can not only help in pre-aggregated data cases but also with data sources that embed their timestamps in non-standard formats.
Search convenience and data storageThe combination of indexing options you select ultimately defines how convenient it is to search your IT data. Any combination of the above options is supported and can be implemented on a per source or source type basis. This lets you minimize the index overhead associated with data that is not searched frequently, while making commonly searched data more convenient for users.
A great example of how this can used to optimize a Splunk deployment would be when using Splunk for IT policy compliance. Splunk can be used to search proxy server and transaction logs for user access monitoring and user activity search, while also serving as a central repository for other types of IT data such as system logs that must be retained but may be of less interest to a compliance administrator.
In order to maintain maximum convenience and allow for saved searches to run quickly and efficiently, the maximum amount of segmentation should be applied to the proxy server and transaction logs which would be configured as discrete sourcetypes. Additional search fields may also be desired to quickly identify certain key/value pairs that may be of interest. System logs, also a discrete sourcetype, could have segmentation disabled given that they are simply being aggregated and stored to adhere to the IT control or mandate.
Comments
No comments have been submitted.