Documentation: 3.3.4
Print Version Contents
This page last updated: 10/21/08 02:10pm

How input configuration works

Splunk consumes any data you point it at. Before indexing data, you must add your data source as an input. The source is then listed as one of Splunk's default fields (whether it's a file, directory or network port).

Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents. To ensure that your input is immediately recognized and indexed, add the input via Splunk Web or by using the add command in the CLI.

Data input methods

Specify data inputs via the following methods:

Most data sources can be specified via Splunk Web. For more extensive configuration options, use inputs.conf. Changes made via Splunk Web or the Splunk CLI are written to $SPLUNK_HOME/etc/system/local/inputs.conf. Configure Windows inputs via inputs.conf as well.

Sources

Splunk accepts data inputs from a wide range of sources. Here's a basic overview of your options. Read on through the Data Inputs and Data Distribution sections of this manual for configuration specifics.

Files and directories

Data inputs include files and directories. Use monitor for continuous, non-destructive inputs from files and directories. Use batch input for one time, destructive file loading. Destructive file loading means that the original files are deleted when Splunk is done indexing them. Keep this in mind when using batch input.

To configure files and directories, see this page.

Monitor

Specify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, as long as the Splunk server can see the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files. Splunk only checks for files and directories each time the Splunk server starts/restarts, so be sure to add new sources when they become available if you don't want to restart the server. You can also use crawl to discover new sources

When using monitor:

  • Files can be opened or closed for writing. Splunk consumes files even if they're still being written to by the operating system.
  • Files or directories can be included or excluded via whitelists and blacklists.
  • Upon restart, Splunk continues processing files where it left off.
  • Splunk detects log file rotation and does not process renamed files it has already indexed.
  • When monitoring a file, the entire path dir/filename must not exceed 1024 characters.
  • When monitoring a directory, set the sourcetype to Automatic. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory.
  • Removing an input does not stop files being indexed. Rather, it stops files from being checked again, but all the initial content will be indexed. To stop all in-process data, you must restart the Splunk server.

Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents. To ensure that your input is immediately recognized and indexed, add the input via Splunk Web or by using the add command in the CLI.

Important: To avoid performance issues, Splunk recommends that you set followTail=1 in inputs.conf if you are deploying Splunk to systems containing significant quantities of historical data. Setting followTail=1 for a monitor input means that any new incoming data is indexed when it arrives, but anything already in files on the system when Splunk was first started will not be processed.

Batch upload

Upload files directly through Splunk Web. If necessary, Splunk uncompresses files before indexing.

Use the batch processor at the CLI or in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and deletes it. For continuous, non-destructive loading of files, use monitor.

FIFO queues

Caution: Due to their vulnerability, FIFOs are not recommended. Monitor is a more reliable, stable method. Support FIFO inputs is deprecated and will be removed in a future release of Splunk.

A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.

To configure FIFO cues, see this page.

Network ports

You can configure Splunk with an Enterprise license to listen on any network port. This is the best method to send data to your Splunk server from any machine (see data distribution for more information). When configuring network ports, keep in mind that you cannot use ports lower than 1024 if you have not installed Splunk as root.

To configure network ports, see this page.

UDP

UDP is a best effort protocol, so you might not get messages if the network is clogged or goes down. You also can't be absolutely sure the messages aren't spoofed or altered in transit. Use UDP for day-to-day troubleshooting rather than compliance or security.

Splunk with an Enterprise license can read directly from the network on any UDP port. Use this configuration to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. You can also send any other UDP source of logging data.

TCP

TCP is a reliable, high-performance choice for many situations, as TCP checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other syslog implementations that use TCP for security or reliability. TCP is the foundation of Splunk's data distribution architecture.

Scripted inputs

Configure Splunk to run shell commands on a schedule, and then index the output.

For example:

  • vmstat, iostat, netstat, and any other network or system status commands.
  • SQL DBI.
  • HTTP and HTTPS requests.
  • SNMP.

See configure scripted inputs for details on setting this up.

Windows data sources

By default, Splunk for Windows indexes all Windows Application, System, and Security event logs. Splunk for Windows can also monitor and index changes to your registry and accept WMI data input. For more information on configuring Splunk for Windows, see this page.

Crawl

Discover new inputs automatically. Crawl uses rules you configure to traverse any given directory structure. Splunk adds new inputs you find via crawl to inputs.conf.

Data processing

Once Splunk consumes data, it sends it to the universal processing pipeline, where it further processes your data. Splunk automatically learns event boundaries, classifies events and sources, and finds timestamps. However, you may want to customize Splunk's default processing. Change processing settings and indexing properties via props.conf.

Some attributes within props.conf can be customized by defining new stanzas in other configuration files. For example, transforms.conf defines regex-based rules for extracting fields, correlating events and performing other transformations. Segmenters.conf and outputs.conf can also define attribute values referenced by props.conf.

Common use cases for custom indexing properties include:

Previous: Start searching    |    Next: Files and directories

Comments

No comments have been submitted.

Log in to comment.