Documentation: 3.4.1
Print Version Contents
This page last updated: 12/30/08 02:12pm

How input configuration works

Splunk consumes any data you point it at. Before indexing data, you must add your data source as an input. The source is then listed as one of Splunk's default fields (whether it's a file, directory or network port).

Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents. To ensure that your input is immediately recognized and indexed, add the input via Splunk Web or by using the add command in the CLI.

Data input methods

Specify data inputs via the following methods:

Most data sources can be specified via Splunk Web. For more extensive configuration options, use inputs.conf. Changes made via Splunk Web or the Splunk CLI are written to $SPLUNK_HOME/etc/system/local/inputs.conf. Configure Windows inputs via inputs.conf as well.

Sources

Splunk accepts data inputs from a wide range of sources. Here's a basic overview of your options. Read on through the Data Inputs and Data Distribution sections of this manual for configuration specifics.

Files and directories

Many data inputs come directly from files and directories. For the most part, you can use Splunk's monitor processor to index data in files and directories. If you have a large archive of historical data, you may want to use batch. Data sent via batch is loaded once and the original files are deleted when Splunk is done indexing them. Keep this in mind when using batch input.

You can also configure Splunk's file system change monitor to watch for changes in your file system. However, you cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.

To configure files and directories, see files and directories.

To configure file system change monitor, see the page on file system change monitor.

Monitor

Specify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, as long as the Splunk server can see the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files. Splunk only checks for files and directories each time the Splunk server starts/restarts, so be sure to add new sources when they become available if you don't want to restart the server. You can also use crawl to discover new sources

When using monitor:

  • Files can be opened or closed for writing. Splunk consumes files even if they're still being written to by the operating system.
  • Files or directories can be included or excluded via whitelists and blacklists. For more information, see "Whitelist and blacklist rules" in this manual.
  • Upon restart, Splunk continues processing files where it left off.
  • Splunk unpacks compressed archive files before it reads them. Splunk can handle the following common archive filetypes: tar, gz, bz2, tar.gz, tgz, tbz, tbz2, zip, and z, and it processes compressed files according to their extension. Keep in mind that unpacking large amounts of compressed files can cause performance issues, so you may want to store old archive files where they are not monitored by Splunk.
  • Splunk detects log file rotation and does not process renamed files it has already indexed, with the exception of archive filetypes such as .tar and .gz, which it will not recognize as being the same as the uncompressed originals (you can exclude them with the blacklist functionality mentioned above). For more information see "Log file rotation" in this manual.
  • The entire path dir/filename for a monitored file must not exceed 1024 characters.
  • Set the sourcetype to Automatic when you monitor a directory. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory.
  • Removing an input does not stop Splunk from indexing files. Instead, it stops Splunk from checking files checked again. Splunk will continue to index all the initial content. To stop all in-process data, you must restart the Splunk server.

Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents. To ensure that your input is immediately recognized and indexed, add the input via Splunk Web or by using the add command in the CLI.

Important: To avoid performance issues, Splunk recommends that you set followTail=1 in inputs.conf if you are deploying Splunk to systems containing significant quantities of historical data. Setting followTail=1 for a monitor input means that any new incoming data is indexed when it arrives, but anything already in files on the system when Splunk was first started will not be processed.

Upload files

Upload files directly through Splunk Web. If necessary, Splunk uncompresses files before indexing.

Use the batch processor at the CLI to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and deletes it. You should only use this for large archives of historical data. For most inputs, use monitor.

FIFO queues

Caution: Due to their vulnerability, FIFOs are not recommended. Monitor is a more reliable, stable method. Support FIFO inputs is deprecated and will be removed in a future release of Splunk.

A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.

To configure FIFO cues, see this page.

Network ports

You can configure Splunk with an Enterprise license to listen on any network port. This is the best method to send data to your Splunk server from any machine (see data distribution for more information). When configuring network ports, keep in mind that you cannot use ports lower than 1024 if you have not installed Splunk as root.

To configure network ports, see this page.

UDP

UDP is a best effort protocol, so you might not get messages if the network is clogged or goes down. You also can't be absolutely sure the messages aren't spoofed or altered in transit. Use UDP for day-to-day troubleshooting rather than compliance or security.

Splunk with an Enterprise license can read directly from the network on any UDP port. Use this configuration to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. You can also send any other UDP source of logging data.

TCP

TCP is a reliable, high-performance choice for many situations, as TCP checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other syslog implementations that use TCP for security or reliability. TCP is the foundation of Splunk's data distribution architecture.

Scripted inputs

Configure Splunk to run shell commands on a schedule, and then index the output.

For example:

  • vmstat, iostat, netstat, and any other network or system status commands.
  • SQL DBI.
  • HTTP and HTTPS requests.
  • SNMP.

See configure scripted inputs for details on setting this up.

Windows data sources

By default, Splunk for Windows indexes all Windows Application, System, and Security event logs. Splunk for Windows can also monitor and index changes to your registry and accept WMI data input. For more information on configuring Splunk for Windows, see this page.

Crawl

Discover new inputs automatically. Crawl uses rules you configure to traverse any given directory structure. Splunk adds new inputs you find via crawl to inputs.conf.

Data processing

Once Splunk consumes data, it sends it to the universal processing pipeline, where it further processes your data. Splunk automatically learns event boundaries, classifies events and sources, and finds timestamps. However, you may want to customize Splunk's default processing. Change processing settings and indexing properties via props.conf.

Some attributes within props.conf can be customized by defining new stanzas in other configuration files. For example, transforms.conf defines regex-based rules for extracting fields, correlating events and performing other transformations. Segmenters.conf and outputs.conf can also define attribute values referenced by props.conf.

Common use cases for custom indexing properties include:

Previous: Start searching    |    Next: Files and directories

Comments

No comments have been submitted.

Log in to comment.