Point Splunk at a file or a directory. If you specify a directory, Splunk consumes everything in the directory. Splunk has two different file input processors: monitor and batch. For the most part, use monitor to input all your data sources from files and directories. The only time you should use batch is to load a large archive of historical files. Read on for more specifics.
MonitorSpecify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, including network filesystems, as long as the Splunk server can see the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files.
Splunk only checks for files and directories each time the Splunk server starts/restarts, so be sure to add new inputs when they become available if you don't want to restart the server. If you want Splunk to find potential new inputs automatically, use crawl.
When using monitor:
Note: You cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.
BatchUse the batch processor at the CLI or in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and deletes it.
Note: Batch is most useful for loading in historical data, such as large archives of files. For best practices on loading file archives, see "How to index different sized archives".
Splunk WebAdd inputs from files and directories via Splunk Web.
1. Click Admin in the upper right-hand corner of Splunk Web.
2. Then click Data Inputs.
3. Pick files and directories.
4. Click New Input to add an input.
5. Under Data access, pick Monitor a directory.
You can also:
6. Specify the pathname to the file or directory. If you select Upload, use the Browse... button.
To monitor a shared network drive, enter the following: <myhost><mypath> (or \\<myhost>\<mypath> on Windows). Make sure your Splunk server can see the mounted drive.
7. Under the Host heading, select the host name. You have several choices if you are using Monitor or Batch methods. Learn more about setting host value.
Note: Host only sets the host field in Splunk. It does not direct Splunk to look on a specific host on your network.
8. Now set the Source Type. Source type is a default field added to events. Source type is used to determine processing characteristics such as timestamps and event boundaries. Learn more about source type.
9. After specifying the source, host, and source type, click Submit.
CLIMonitor files and directories via Splunk's Command Line Interface (CLI). To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command from the UNIX or Windows command prompt. Or add Splunk to your path and use the splunk command.
If you get stuck, Splunk's CLI has built-in help. Access the main CLI help by typing splunk help. Individual commands have their own help pages as well -- type splunk help <command>.
The following commands are available for input configuration via the CLI:
| Command | Command syntax | Action |
| add | add monitor $SOURCE [-parameter value] ... | Add inputs from $SOURCE. |
| edit | edit monitor $SOURCE [-parameter value] ... | Edit a previously added input for $SOURCE. |
| remove | remove monitor $SOURCE | Remove a previously added $SOURCE. |
| list | list monitor | List the currently configured monitor. |
| spool | spool source | Copy a file into Splunk via the sinkhole directory. |
Change the configuration of each data input type by setting additional parameters. Parameters are set via the syntax: -parameter value.
Note: You can only set one -hostname, -hostregex or -hostsegmentnum per command.
Required parameters
| source | Path to the file or directory to monitor for new input. |
Optional parameters
| sourcetype | Specify a sourcetype field value for events from the input source. |
| index | Specify the destination index for events from the input source. |
| hostname | Specify a host name to set as the host field value for events from the input source. |
| hostregex | Specify a regular expression on the source file path to set as the host field value for events from the input source. |
| hostsegmentnum | Set the number of segments of the source file path to set as the host field value for events from the input source. |
| follow-only | (T | F) True or False. Default False. When set to True, Splunk will read from the end of the source (like the "tail -f" Unix command). |
The following example shows how to monitor files in /var/log/:
Add /var/log/ as a data input:
./splunk add monitor /var/log/
To add an input, add a stanza for it to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read how configuration files work before you begin.
You can set any number of attributes and values following an input type. If you do not specify a value for one or more attributes, Splunk uses the defaults that are preset in $SPLUNK_HOME/etc/system/default/ (noted below).
Monitor[monitor://<path>] <attrbute1> = <val1> <attrbute2> = <val2> ...
Note: To ensure new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes when it changes. Note that the entire file is indexed, which can result in duplicate events.
host = <string>
index = <string>
sourcetype = <string>
source = <string>
queue = <string> (parsingQueue, indexQueue, etc)
host_regex = <regular expression>
host_segment = <integer>
crcSalt = <string>
followTail = 0|1
_whitelist = <regular expression>
_blacklist = <regular expression>
You can use wildcards to specify your input path for monitored input. Use ... for paths and * for files.
Note: In Windows, you must use two backslashes \\ to escape wildcards. Regexes with backslashes in them are not currently supported for _whitelist and _blacklist in Windows.
Specifying wildcards results in an implicit _whitelist created for that stanza. The longest fully qualified path is used as the monitor stanza, and the wildcards are translated into regular expressions using the following map:
| wildcard | regex | meaning |
| * | [^/]* | anything but / |
| ... | .* | anything (greedy) |
| . | \. | literal . |
For example, if you specify
[monitor:///foo/bar*.log]
[monitor:///foo/] _whitelist = bar[^/]*\.log
As a consequence, you can't have multiple stanzas with wildcards for files in the same directory.
For example:
[monitor:///foo/bar_baz*] [monitor:///foo/bar_qux*]
[monitor:///foo] _whitelist = (bar_baz[^/]*|bar_qux[^/]*)
To load anything in /apache/foo/logs or /apache/bar/logs, etc.
[monitor:///apache/.../logs]
[monitor:///apache/*.log]
[batch://<path>] move_policy = sinkhole <attrbute1> = <val1> <attrbute2> = <val2> ...
host = <string>
index = <string>
sourcetype = <string>
source = <string>
queue = <string> (parsingQueue, indexQueue, etc)
host_regex = <regular expression>
host_segment = <integer>
Note: source = <string> and <KEY> = <string> are not used by batch.
ExampleThis example batch loads all files from the directory /system/flight815/.
[batch://system/flight815/*] move_policy = sinkhole
Comments
@vivekpara: thank you for your comment/suggestion. please send an email to support@splunk.com requesting this change.
Posted by rachel on Dec 16 2008, 11:36am
I'm new to Splunk, but it seems really bad to call a process that processes and then deletes a file a "batch" job. Shouldn't this be referred to as an "Archive" request. The common usage of "batch" to refer to a "scripted" job seems problematic at best when naming this process. Even "sinkhole" is better than "batch".
Posted by vivekpara on Dec 16 2008, 6:48am