You can now configure Splunk to automatically extract fields from data sources that are formatted with headers (for example: CSV, TM3, or MS exchange log files). Use automatic header-based field extraction instead of configuring the fields you want to extract by hand. You can access fields that Splunk automatically extracts using the Fields picker in Splunk Web. You can use them for filtering and reporting just like any other extracted field.
If you have a source that is an MS Exchange file and want to extract fields from it using its header information:
# Message Tracking Log File # Exchange System Attendant Version 6.5.7638.1 # Fields: time client-ip cs-method sc-status 14:13:11 10.1.1.9 HELO 250 14:13:13 10.1.1.9 MAIL 250 14:13:19 10.1.1.9 RCPT 250 14:13:29 10.1.1.9 DATA 250 14:13:31 10.1.1.9 QUIT 240
If you enable automatic header-based field extraction, Splunk extracts the fields: time, client-ip, cs-method, and sc-status using the fields and delimiters defined in the source header (# Fields: time client-ip cs-method sc-status).
For example, Splunk extracts the following fields from the first event: (14:13:11 10.1.1.9 HELO 250).
For each source or source type you configure with automatic header-based field extraction, Splunk scans matching sources for header information to use to extract the fields (predefined fields and delimiters). If a source has the necessary information, Splunk extracts fields using delimiter-based key/value extraction (link). Splunk does this by creating an entry in transforms.conf for the source, and populating it with transforms to extract the fields. Splunk also adds a source type stanza to props.conf to tie the field extraction transforms to the source. Splunk then applies the transforms to events from the source at search time.
Note: Automatic header-based field extraction doesn't impact index size or indexing performance because it occurs during source typing (before index time).
Configure automatic header-based field extractionConfigure automatic header-based field extraction for any source or source type by editing props.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see how configuration files work.
Add CHECK_FOR_HEADER=TRUE under any source or source type stanza to turn on automatic header-based field extraction for that source type.
Example props.conf entry using the MS Exchange file from the introduction:
[MSExchange] CHECK_FOR_HEADER=TRUE ...
Note: Set CHECK_FOR_HEADER=FALSE to turn off automatic header-based field extraction.
Changes Splunk makes to configuration filesSplunk adds configuration information to copies of transforms.conf and props.conf in $SPLUNK_HOME/etc/apps/learned/ during automatic header-based field extraction.
Note: Editing configuration information that Splunk adds causes extracted fields to not function properly.Splunk creates a stanza in transforms.conf for each source type with unique header information that matches a source type defined in props.conf. Splunk names each stanza it creates as [AutoHeader-M], where M in an integer that increments sequentially for each source that has a unique header ([AutoHeader-1], [AutoHeader-2],...,[AutoHeader-M]). Splunk populates each stanza with transforms to extract the fields (using header information).
Example transforms.conf entry using the MS Exchange file from the introduction:
... [AutoHeader-1] FIELDS="time", "client-ip", "cs-method", "sc-status" DELIMS=" " ...
Splunk then adds new source type stanzas to props.conf for each unique source. Splunk names the stanzas as [yoursource-N], where yoursource is the source type configured with automatic header-based field extraction, and N is an integer that increments sequentially for each transform in transforms.conf.
Example props.conf entry using the MS Exchange file from the introduction:
# the original source you configure [MSExchange] CHECK_FOR_HEADER=TRUE ... # source type that Splunk adds to tie to transforms for automatic header-based field extraction [MSExchange-1] REPORT-AutoHeader = AutoHeader-1 ...
To return all events that Splunk types with a source type it generated while running automatic header-based field extraction, use a wildcard to search for all events of that source type.
A search for sourcetype="yoursource" looks like this:
These examples show how header-based field extraction works with common source types.
MS Exchange source fileThis example shows how Splunk extracts fields from an MS Exchange file using automatic header-based field extraction.
This sample MS Exchange log file has a header containing a list of field names, delimited by spaces:
# Message Tracking Log File # Exchange System Attendant Version 6.5.7638.1 # Fields: time client-ip cs-method sc-status 14:13:11 10.1.1.9 HELO 250 14:13:13 10.1.1.9 MAIL 250 14:13:19 10.1.1.9 RCPT 250 14:13:29 10.1.1.9 DATA 250 14:13:31 10.1.1.9 QUIT 240
Splunk creates a header and transform in tranforms.conf:
[AutoHeader-1] FIELDS="time", "client-ip", "cs-method", "sc-status" DELIMS=" "
Splunk then ties the transform to the source by adding this to the source type stanza in props.conf:
# Original source type stanza you create [MSExchange] CHECK_FOR_HEADER=TRUE ... # source type stanza that Splunk creates [MSExchange-1] REPORT-AutoHeader = AutoHeader-1 ...
Splunk automatically extracts the following fields from each event:
14:13:11 10.1.1.9 HELO 250
14:13:13 10.1.1.9 MAIL 250
14:13:19 10.1.1.9 RCPT 250
14:13:29 10.1.1.9 DATA 250
14:13:31 10.1.1.9 QUIT 240
This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.
Example CSV file contents:
foo,bar,anotherfoo,anotherbar 100,21,this is a long file,nomore 200,22,wow,o rly? 300,12,ya rly!,no wai!
Splunk creates a header and transform in tranforms.conf (located in: $SPLUNK_HOME/etc/apps/learned/transforms.conf):
# Some previous automatic header-based field extraction [AutoHeader-1] ... # source type stanza that Splunk creates [AutoHeader-2] FIELDS="foo", "bar", "anotherfoo", "anotherbar" DELIMS=","
Splunk then ties the transform to the source by adding this to a new source type stanza in props.conf:
... [CSV-1] REPORT-AutoHeader = AutoHeader-2 ...
Splunk extracts the following fields from each event:
100,21,this is a long file,nomore
200,22,wow,o rly?
300,12,ya rly!,no wai!
Comments
No comments have been submitted.