Splunk consumes any data you point it at. Before indexing data, you must add your data source as an input. The source is then listed as one of Splunk's default fields (whether it's a file, directory or network port).
Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents. To ensure that your input is immediately recognized and indexed, add the input via Splunk Web or by using the add command in the CLI.
Data input methodsSpecify data inputs via the following methods:
Most data sources can be specified via Splunk Web. For more extensive configuration options, use inputs.conf. Changes made via Splunk Web or the Splunk CLI are written to $SPLUNK_HOME/etc/system/local/inputs.conf. Configure Windows inputs via inputs.conf as well.
SourcesSplunk accepts data inputs from a wide range of sources. Here's a basic overview of your options. Read on through the Data Inputs and Data Distribution sections of this manual for configuration specifics.
Files and directoriesMany data inputs come directly from files and directories. For the most part, you can use Splunk's monitor processor to index data in files and directories. If you have a large archive of historical data, you may want to use batch. Data sent via batch is loaded once and the original files are deleted when Splunk is done indexing them. Keep this in mind when using batch input.
You can also configure Splunk's file system change monitor to watch for changes in your file system. However, you cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.
To configure files and directories, see files and directories.
To configure file system change monitor, see the page on file system change monitor.
MonitorSpecify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, as long as the Splunk server can see the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files. Splunk only checks for files and directories each time the Splunk server starts/restarts, so be sure to add new sources when they become available if you don't want to restart the server. You can also use crawl to discover new sources
When using monitor:
Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents. To ensure that your input is immediately recognized and indexed, add the input via Splunk Web or by using the add command in the CLI.
Important: To avoid performance issues, Splunk recommends that you set followTail=1 in inputs.conf if you are deploying Splunk to systems containing significant quantities of historical data. Setting followTail=1 for a monitor input means that any new incoming data is indexed when it arrives, but anything already in files on the system when Splunk was first started will not be processed.
Upload filesUpload files directly through Splunk Web. If necessary, Splunk uncompresses files before indexing.
Use the batch processor at the CLI to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and deletes it. You should only use this for large archives of historical data. For most inputs, use monitor.
FIFO queuesCaution: Due to their vulnerability, FIFOs are not recommended. Monitor is a more reliable, stable method. Support FIFO inputs is deprecated and will be removed in a future release of Splunk.
A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.
To configure FIFO cues, see this page.
Network portsYou can configure Splunk with an Enterprise license to listen on any network port. This is the best method to send data to your Splunk server from any machine (see data distribution for more information). When configuring network ports, keep in mind that you cannot use ports lower than 1024 if you have not installed Splunk as root.
To configure network ports, see this page.
UDPUDP is a best effort protocol, so you might not get messages if the network is clogged or goes down. You also can't be absolutely sure the messages aren't spoofed or altered in transit. Use UDP for day-to-day troubleshooting rather than compliance or security.
Splunk with an Enterprise license can read directly from the network on any UDP port. Use this configuration to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. You can also send any other UDP source of logging data.
TCPTCP is a reliable, high-performance choice for many situations, as TCP checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other syslog implementations that use TCP for security or reliability. TCP is the foundation of Splunk's data distribution architecture.
Scripted inputsConfigure Splunk to run shell commands on a schedule, and then index the output.
For example:
See configure scripted inputs for details on setting this up.
Windows data sourcesBy default, Splunk for Windows indexes all Windows Application, System, and Security event logs. Splunk for Windows can also monitor and index changes to your registry and accept WMI data input. For more information on configuring Splunk for Windows, see this page.
CrawlDiscover new inputs automatically. Crawl uses rules you configure to traverse any given directory structure. Splunk adds new inputs you find via crawl to inputs.conf.
Data processingOnce Splunk consumes data, it sends it to the universal processing pipeline, where it further processes your data. Splunk automatically learns event boundaries, classifies events and sources, and finds timestamps. However, you may want to customize Splunk's default processing. Change processing settings and indexing properties via props.conf.
Some attributes within props.conf can be customized by defining new stanzas in other configuration files. For example, transforms.conf defines regex-based rules for extracting fields, correlating events and performing other transformations. Segmenters.conf and outputs.conf can also define attribute values referenced by props.conf.
Common use cases for custom indexing properties include:
Comments
No comments have been submitted.