Documentation: 3.4.1
Print Version Contents
This page last updated: 01/07/09 03:01pm

About inputs

Splunk can access and process any format of IT data from different sources on your filesystem. Data sources include files, FIFO queues, network ports, databases, and scripts. You can add most of these input types to your index using Splunk Web's Data Inputs page.

This topic discusses the different input types you can add to Splunk's index using Splunk Web. For information about using other methods to define inputs (such as using inputs.conf), refer to the Admin manual's topic on data inputs.

Files and directories

When adding a new file or directory to your data inputs, you can monitor a directory, upload a local file, or index a file on the Splunk server. Use monitor to add continuous and non-destructive inputs. Upload or index files to add one-time and destructive inputs.

Monitor

Splunk's monitor command is similar to the UNIX tail -f command for file monitoring. When you monitor a directory, Splunk detects subdirectories and recursively examines them for new files. As new files are added to the directory, Splunk detects the changes and indexes any new data.

When you configure inputs via Splunk Web, Splunk modifies your inputs.conf file in $SPLUNK_HOME/etc/system/local to include a stanza that defines your new input.

For example, If you monitor /var/log, Splunk adds the following stanza to your local inputs.conf:

[monitor:///var/log]
disabled = false
host = <hostname>

Also, you can view and edit the input properties of your monitored directory from Admin > Data Inputs: Files & Directories in Splunk Web.

Upload

Browse for a local file and add it directly to your inputs. If you have a previous version of the file as an input, uploading a new file overwrites the existing version. Unlike a monitored file, the uploaded file does not continuously update. Therefore, use the upload option for one-time and destructive inputs.

Uploading a local file does not modify inputs.conf. Instead, it copies the specified file into $SPLUNK_HOME/var/run/splunk/upload/ and then moves it into $SPLUNK_HOME/var/spool/splunk/ for indexing. After indexing, Splunk deletes the file; hence, the indexed file does not show up as a new data input in Admin > Data Inputs: Files & Directories.

Note: When you upload a local file, if necessary, Splunk uncompresses the file before processing it.

Index

Indexing a file on the Splunk server copies the file directly into /var/spool/splunk, where it exists while Splunk processes the data. Similar to uploading a local file, this operation does not modify inputs.conf. After indexing, Splunk deletes the file; hence, the indexed file does not show up as a new data input in Admin > Data Inputs: Files & Directories.

FIFO queues

Caution: FIFOs are not recommended for application servers forwarding data to Splunk in a distributed setting. Due to their vulnerability, Splunk does not recommend that you use FIFOs. Monitor is a more reliable, stable method. Support FIFO inputs is deprecated and will be removed in a future release of Splunk.

Splunk accesses the data in a FIFO, or named pipe, queue as though it were a file. When defining a FIFO input in Splunk Web, provide the path that directs Splunk to the queue. FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.

Network ports

Splunk supports UDP and TCP connections. When configuring network ports, keep in mind that you cannot use ports lower than 1024 if you are not running Splunk as root.

UDP

UDP is a best effort protocol; you might not get messages if the network is clogged or has a hiccup. You also can't be absolutely sure the messages aren't spoofed or altered in transit. UDP should be reserved for logging implementations focused on day-to-day troubleshooting rather than compliance or security.

Splunk with an Enterprise license can read directly from the network on any UDP port. Use this configuration to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. You can also send any other UDP source of logging data.

Like all network streaming approaches, direct UDP input is higher performance than reading files from disk.
Check the Spunk Wiki for information about the best practices for using UDP when configuring Syslog input.

TCP

TCP is a reliable, high-performance choice for most situations, since this protocol includes checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other syslog implementations that use TCP for security or reliability. TCP is the foundation of Splunk's distributed data access.

Note: If the sending process buffers data such that events are broken into multiple pieces, Splunk may interpret the parts as multiple events. This is more likely if events are being generated intermittently, as there may be long pauses (several seconds or longer) between blocks of buffered data. If you notice truncated events, try forcing the process to send events atomically.

Previous: More searches    |    Next: Use Data Inputs page

Comments

  1. <HTML>

Log in to comment.