Documentation: 3.3
Print Version Contents
This page last updated: 07/16/08 10:07am

About inputs

Splunk can access and process any format of IT data from different sources on your filesystem. Data sources include logfiles, FIFO queues, network ports, databases, and scripts. You can add most of these input types to your index using Splunk Web's Data Inputs page.

This topic discusses the different input types you can add to Splunk's index using Splunk Web. For information about using other methods to define inputs (such as using inputs.conf), refer to the Admin manual's data inputs page.

Files and directories

When adding a new file or directory to your data inputs, you can monitor a directory, upload a local file, or index a file on the Splunk server. Use monitor to add continuous and non-destructive inputs. Upload or index files to add one-time and destructive inputs.

Monitor

Splunk's monitor command is similar to the UNIX tail -f command for file monitoring. When you monitor a directory, Splunk detects subdirectories and recursively examines them for new files. As new files are added to the directory, Splunk detects the changes and updates your indexes.

When you use Splunk Web to monitor a directory, Splunk modifies your inputs.conf file in /system/local to include a stanza that defines your new input. If you monitor /var/log, Splunk adds the following stanza to your local inputs.conf:

[monitor:///var/log]
disabled = false
host = <hostname>

Also, you can view and edit the input properties of your monitored directory from Admin > Data Inputs: Files & Directories in Splunk Web.

Upload

You can browse for a local file and add it directly to your inputs. If you have a previous version of the file as an input, uploading a new file will overwrite the existing version. Unlike a monitored file, the uploaded file does not continuously update. Therefore, use the upload option for one-time and destructive inputs.

Uploading a local file does not modify inputs.conf. Instead, it uploads the specified file into /var/run/splunk/upload and then moves it into /var/spool/splunk for indexing. After indexing, Splunk deletes the file; hence, the indexed file does not show up as a new data input in Admin > Data Inputs: Files & Directories.

Note: When you upload a local file, if necessary, Splunk unpacks and uncompresses the file before processing it.

Index

Indexing a file on the Splunk server copies the file directly into /var/spool/splunk, where it exists while Splunk processes the data. Similar to uploading a local file, this operation does not modify inputs.conf. After indexing, Splunk deletes the file; hence, the indexed file does not show up as a new data input in Admin > Data Inputs: Files & Directories.

FIFO queues

Splunk accesses the data in a FIFO, or named pipe, queue as though it were a file. When defining a FIFO input in Splunk Web, provide the path that directs Splunk to the queue. When choosing the FIFO data input method, consider the following:

  • FIFO queues can be a high performance method to get data into Splunk, since the system does not have the I/O burden of writing to both a file on disk and Splunk's index on disk (like monitor).
  • FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.
  • You do not have to worry about log file rotation and archiving because the data goes straight from the logging application into Splunk via the queue. There is nothing on disk to manage except for Splunk's index.
  • Most syslog implementations can write to FIFO queues in addition to or instead of files.
  • Other applications can write to FIFO queues instead of files by just changing a logfile name parameter from a filename to a defined FIFO queue.

Note: FIFOs are not recommended for application servers forwarding data to Splunk in a distributed setting. Monitor is a more reliable, stable method.

Network ports

Splunk supports UDP and TCP connections. When configuring network ports, keep in mind that you cannot use ports lower than 1024 if you are not running Splunk as root.

UDP

UDP is a best effort protocol; you might not get messages if the network is clogged or has a hiccup. You also can't be absolutely sure the messages aren't spoofed or altered in transit. UDP should be reserved for logging implementations focused on day-to-day troubleshooting rather than compliance or security.

Splunk with an Enterprise license can read directly from the network on any UDP port. Use this configuration to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. You can also send any other UDP source of logging data, including SNMP.

Like all network streaming approaches, direct UDP input is higher performance than reading files from disk.

TCP

TCP is a reliable, high-performance choice for most situations, since this protocol includes checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other syslog implementations that use TCP for security or reliability. TCP is the foundation of Splunk's distributed data access.

Note: If the sending process buffers data such that events are broken into multiple pieces, Splunk may interpret the parts as multiple events. This is more likely if events are being generated intermittently, as there may be long pauses (several seconds or longer) between blocks of buffered data. If you notice truncated events, try forcing the process to send events atomically.

Previous: More searches    |    Next: Use Data Inputs page

Comments

No comments have been submitted.

Log in to comment.