This documentation does not apply to the most recent version of Splunk.
This documentation applies to the following versions of Splunk: 2.1 , 2.2 , 2.2.1 , 2.2.3 , 2.2.6
It is easy to change the breakers (also called cleaners) used to tokenize data prior to indexing.
Changing the breakers can help clean up the way the server segment terms or to improve performance / diskspace usage by reducing the number of terms indexed per event.
There are two types of breakers; major and minor. Major breakers are most closely related to word boundaries where minor breakers are sections of a word that are indexed independently.
For example, take the following:
bob.smith@splunk.com splunk.com/people/~bobsmith/calc.pl
In the above it would produce indexed terms for:
bob.smith@splunk.com bob.smith@splunk bob.smith bob splunk.com/people/~bobsmith/calc.pl splunk.com/people/~bobsmith/calc splunk.com/people/~bobsmith splunk.com/people splunk.com splunk
The major breaker in the example above is the space character.
The minor breakers are . / and @
Breakers are specified in a an XML config file.
The servers ships with two default files:
SPLUNK_HOME/etc/myinstall/pluginconfs/
cleaners.xml
majorOnly_cleaners.xml
The format of these files is a list of breaking separators with a name, the characters, and attribute if they are minor or major.
The following is an example of breaker '['
<breakingSeparator name="lsqrBracketBreak" isMajor="1">
<value>[</value>
</breakingSeparator>
If you want to change or add a breaker you can go and change the cleaners.xml file and then restart the server.
The best way to create you own cleaners files is to copy either SPLUNK_HOME/myinstall/pluginconfis/cleaners.xml or majorOnly_cleaners.xml and then edit/remove entries as necessary.
If you create your own cleaner then you must edit the multiIndex.xml configuration to use your new cleaner.
Every index has one cleaner. A default cleaner is located at top of the multiIndex.xml file (<defaultCleaningConfig>) which is used as a cleaner is not specified in the index tag.
To add your cleaner either change the default cleaner <defaultCleaningConfig> uri or add a <cleaningConfig> tag and uri to a specific <database> tag.
<databases>
<database>
<name>main</name>
<dbHomePath>$$SPLUNK_DB]]/defaultdb/db</dbHomePath>
<coldDBPath>$$SPLUNK_DB]]/defaultdb/colddb</coldDBPath>
<cleaningConfig>$$SPLUNK_HOME]]/etc/myinstall/pluginconfs/mycleaners.xml
...
</database>
</databases>
NOTE - its not advised to change cleaners on an index that contains data. You should clean the index after changing the cleaners config.