This documentation does not apply to the most recent version of Splunk.
This documentation applies to the following versions of Splunk: 3.0 , 3.0.1 , 3.0.2 , 3.1 , 3.1.1 , 3.1.2 , 3.1.3 , 3.1.4
Splunk's data inputs are specified via inputs.conf. In most cases, you will not need to modify these files as the required information is written when you configure data inputs through Splunk CLI or SplunkWeb. For more granularity, however, you may wish to configure settings via inputs.conf. Changes made via SplunkWeb or the Splunk CLI are stored in $SPLUNK_HOME/etc/bundles/local/inputs.conf.
As Splunk processes files, it assigns default values for sourcetype, hostname, and index for each file. You can override these setting when you define inputs, or you can modify them later.
Data inputs have characteristics that are independent of how the inputs are defined. A tail input type behaves the same whether you add it in Splunk CLI or SplunkWeb This section describes the purpose, behavior, and rules/restrictions of the Splunk data input types.
The next section describes the mechanics of adding the data inputs via the various Splunk interfaces and how to modify indexing settings.
Data inputs can come from files and directories. Data in files can be processed in live or batch mode. Live input is for active log files and is handled through Splunk's tail processor. Batch input is for closed, archived data, and batch files are handled through the Upload or Watch/Batch processors.
Splunk's tail behaves like the UNIX tail command. Specify a path to a file or directory whose contents should be indexed by the Splunk server, and Splunk will watch and consume any new input. If subdirectories exist, Splunk will recursively examine them for log files. If new files appear in a tailed directory, Splunk will add them to the index.
Please note: Starting with Splunk 3.0.2, the tail input method allows you to specify the option to have tail process files like UNIX tail -f. Specifically, you have the option to have tail read the end of a file and wait for new input rather than consume the entire file and wait for new input. This option is specified in inputs.conf with the followTail attribute. A value of 1 indicates to read from the end of the file. The default is 0, or read the entire file. This option will be ignored if the file has ever been indexed by Splunk.
In addition, when tailing a file for input:
When tailing a directory for input:
Please note: If the specified file or directory does not exist, the Splunk server will not check to see if it is created later. Splunk only checks for files and directories each time the Splunk server starts (or is restarted). So be sure to explicitly add new files as inputs when they become available if you don't want to restart the server. When tailing a file the entire path dir/filename must not exceed 1024 characters.
Splunk has a batch processing module. It watches any specified directory on the local Splunk server's file system and then processes the entirety of any new file that appears. You can also upload archived files directly into Splunk for analysis. If necessary, Splunk will unpack and uncompress a file before indexing. Keep in mind that Splunk will need adequate disk space to uncompress these files, and that this processing can take more time than processing a live or uncompressed file.
By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. You can set up your own watch directory as well.
Please note: This method will not keep watch on the files it has already seen, so it's not designed for live logfiles -- just rotated archive copies.
In addition, when batch uploading or watching, Splunk can:
A FIFO (AKA named pipe) is a queue of data maintained in a Unix host's memory. It can be accessed like a file and log messages can be written to it. When choosing the FIFO data input method consider the following:
UDP and TCP ports can feed data into the Splunk Server. UDP and TCP behave differently, and these behaviors effect how data arrives for processing. When configuring network ports, please keep in mind that you cannot use ports lower than 1024 if you have not installed Splunk as root.
UDP is a best effort protocol. This means that you might not get messages if the network is clogged, or has a hiccup. You also can't be absolutely sure the messages aren't spoofed or altered in transit. UDP should be reserved for logging implementations focused on day-to-day troubleshooting rather than compliance or security.
Splunk Enterprise can read directly from the network on any UDP port. This technique is most often used to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. However, it also can be used for any other UDP source of logging data, including SNMP.
Like all of the network streaming-based approaches, direct UDP input is higher performance than reading files from disk.
TCP is a reliable, high-performance choice for many situations, as this protocol includes checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and alternative syslog implementations that use TCP for security or reliability. This feature is the foundation of Splunk's distributed data access.
Please note: If the sending process buffers data such that events are broken into multiple pieces, Splunk may interpret the parts as multiple events. This is more likely if events are being generated intermittently, as there may be long pauses (several seconds or longer) between blocks of buffered data. If you notice truncated events, try forcing the process to send events atomically.
Splunk can be configured to run an arbitrary shell command on any schedule, and then pipe the output to Splunk for processing. Examples of shell scripts that process meaningful data for Splunk to digest include:
See Configure scripted inputs for details on how to set this up.
A distinguishing characteristic of Splunk is that it can universally process any IT data, regardless of format. It automatically learns event boundaries, classifies events and sources, and finds timestamps. However, sometimes you may want to change or augment Splunk's default processing. This can be done via setting indexing properties in a props.conf file in the $SPLUNK_HOME/etc/bundles/local directory. (Read more about bundles.)
Some attributes within props.conf can be customized by defining new stanzas in other configuration files, most commonly transforms.conf, which defines regex-based rules for extracting fields, correlating events and performing other transformations. Segmenters.conf, outputs.conf and metaevents.conf can also define attribute values that can be referenced by props.conf.
Common use cases for custom indexing properties include: