This documentation does not apply to the most recent version of Splunk.
This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13
Point Splunk at a file or a directory. If you specify a directory, Splunk consumes everything in the directory. Splunk has two different file input processors: monitor and batch. For the most part, use monitor to input all your data sources from files and directories. The only time you should use batch is to load a large archive of historical files. Read on for more specifics.
Specify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, including network filesystems, as long as the Splunk server can read from the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files.
Splunk checks for the file or directory specified in a monitor configuration on Splunk server start and restart. If the file or directory specified is not present on start, Splunk checks for it again in 24 intervals from the time of the last restart. Subdirectories of monitored directories are scanned continuously. To add new inputs without restarting Splunk, use Splunk Web or the command line interface. If you want Splunk to find potential new inputs automatically, use crawl.
When using monitor:
Note: You cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.
Note: Monitor input stanzas may not overlap. That is, monitoring /a/path while also monitoring /a/path/subdir will produce unreliable results. Similarly, monitor input stanzas which watch the same directory with different whitelists, blacklists, and wildcard components are not supported.
Use the batch processor at the CLI or in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and then deletes it.
Note: Batch is most useful for loading in historical data, such as large archives of files. For best practices on loading file archives, see "How to index different sized archives".
Add inputs from files and directories via Splunk Web.
1. Click Admin in the upper right-hand corner of Splunk Web.
2. Then click Data Inputs.
3. Pick files and directories.
4. Click New Input to add an input.
5. Under Data access, pick Monitor a directory.
You can also:
6. Specify the pathname to the file or directory. If you select Upload, use the Browse... button.
To monitor a shared network drive, enter the following: <myhost><mypath> (or \\<myhost>\<mypath> on Windows). Make sure your Splunk server has read access to the mounted drive as well as the files you wish to monitor.
7. Under the Host heading, select the host name. You have several choices if you are using Monitor or Batch methods. Learn more about setting host value.
Note: Host only sets the host field in Splunk. It does not direct Splunk to look on a specific host on your network.
8. Now set the Source Type. Source type is a default field added to events. Source type is used to determine processing characteristics such as timestamps and event boundaries. Learn more about source type.
9. After specifying the source, host, and source type, click Submit.
Monitor files and directories via Splunk's Command Line Interface (CLI). To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command from the UNIX or Windows command prompt. Or add Splunk to your path and use the splunk command.
If you get stuck, Splunk's CLI has built-in help. Access the main CLI help by typing splunk help. Individual commands have their own help pages as well -- type splunk help <command>.
The following commands are available for input configuration via the CLI:
| Command | Command syntax | Action |
|---|---|---|
| add | add monitor $SOURCE [-parameter value] ...
| Add inputs from $SOURCE.
|
| edit | edit monitor $SOURCE [-parameter value] ...
| Edit a previously added input for $SOURCE.
|
| remove | remove monitor $SOURCE
| Remove a previously added $SOURCE.
|
| list | list monitor
| List the currently configured monitor. |
| spool | spool source
| Copy a file into Splunk via the sinkhole directory. |
Change the configuration of each data input type by setting additional parameters. Parameters are set via the syntax: -parameter value.
Note: You can only set one -hostname, -hostregex or -hostsegmentnum per command.
| Parameter | Required? | Description |
|---|---|---|
source
| Required | Path to the file or directory to monitor for new input. |
sourcetype
| Optional | Specify a sourcetype field value for events from the input source. |
index
| Optional | Specify the destination index for events from the input source. |
hostname
| Optional | Specify a host name to set as the host field value for events from the input source. |
hostregex
| Optional | Specify a regular expression on the source file path to set as the host field value for events from the input source. |
hostsegmentnum
| Optional | Set the number of segments of the source file path to set as the host field value for events from the input source. |
follow-only
| Optional | (T/F) True or False. Default False. When set to True, Splunk will read from the end of the source (like the "tail -f" Unix command). |
The following example shows how to monitor files in /var/log/:
Add /var/log/ as a data input:
./splunk add monitor /var/log/
The following example shows how to monitor the Windows Update log (where Windows logs automatic updates):
Add C:\Windows\windowsupdate.log as a data input:
./splunk add monitor C:\Windows\windowsupdate.log
This example shows how to monitor the default location for Windows IIS logging:
Add C:\windows\system32\LogFiles\W3SVC as a data input:
./splunk add monitor c:\windows\system32\LogFiles\W3SVC
To add an input, add a stanza for it to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read how configuration files work before you begin.
You can set any number of attributes and values following an input type. If you do not specify a value for one or more attributes, Splunk uses the defaults that are preset in $SPLUNK_HOME/etc/system/default/ (noted below).
[monitor://<path>] <attrbute1> = <val1> <attrbute2> = <val2> ...
This type of input stanza (monitor) directs Splunk to watch all files in the <path> (or just <path> itself if it represents a single file). You must specify the input type and then the path, so put three slashes in your path if you're starting at root. You can use wildcards for the path. For more information, see the "Wildcards" subsection, below.
Note: To ensure new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes when it changes. Note that the entire file is indexed, which can result in duplicate events.
host = <string>
host= is automatically prepended to the value when this shortcut is used.
index = <string>
index= is automatically prepended to the value when this shortcut is used.
main (or whatever you have set as your default index).
sourcetype = <string>
sourcetype= is automatically prepended to the value when this shortcut is used.
source = <string>
source= is automatically prepended to the value when this shortcut is used.
queue = <string> (parsingQueue, indexQueue, etc)
parsingQueue.
host_regex = <regular expression>
host= attribute if the regex fails to match.
host_segment = <integer>
host:: attribute if the value is not an integer, or is less than 1.
crcSalt = <string>
crcSalt = <SOURCE> (note: This setting is case sensitive), then the full source path is added to the CRC.
followTail = 0|1
tail -f).
_whitelist = <regular expression>
_blacklist = <regular expression>
You can use wildcards to specify your input path for monitored input. Use ... for paths and * for files.
... recurses through directories until the match is met. This means that /foo/.../bar will match foo/bar, foo/1/bar, foo/1/2/bar, etc. but only if bar is a file.
.... For example /foo/.../bar/....
* matches anything in that specific path segment. It cannot be used inside of a directory path; it must be used in the last segment of the path. For example /foo/*.log matches /foo/bar.log but not /foo/bar.txt or /foo/bar/test.log.
* and ... for more specific matches:
foo/.../bar/* matches any file in the bar directory within the specified path.
Note: In Windows, you must use two backslashes \\ to escape wildcards. Regexes with backslashes in them are not currently supported for _whitelist and _blacklist in Windows.
Specifying wildcards results in an implicit _whitelist created for that stanza. The longest fully qualified path is used as the monitor stanza, and the wildcards are translated into regular expressions using the following map:
| wildcard | regex | meaning |
|---|---|---|
*
| [^/]*
| anything but / |
...
| .*
| anything (greedy) |
.
| \.
| literal . |
Additionally, the converted expression is anchored to the right end of the file path, so that the entire path must be matched.
For example, if you specify
[monitor:///foo/bar*.log]
Splunk translates this into
[monitor:///foo/] _whitelist = bar[^/]*\.log$
As a consequence, you can't have multiple stanzas with wildcards for files in the same director.
Also, you cannot use a _whitelist declaration in conjunction with wildcards.
For example:
[monitor:///foo/bar_baz*] [monitor:///foo/bar_qux*]
This results in overlapping stanzas indexing the directory /foo/. Splunk takes the first one, so only files starting with /foo/bar_baz will be indexed. To include both sources, manually specify a _whitelist using regular expression syntax for "or":
[monitor:///foo] _whitelist = (bar_baz[^/]*|bar_qux[^/]*)$
Note: To set any additional attributes (such as sourcetype) for multiple whitelisted/blacklisted inputs that may have different attributes, use props.conf.
To load anything in /apache/foo/logs or /apache/bar/logs, etc.
[monitor:///apache/.../logs]
To load anything in /apache/ that ends in .log.
[monitor:///apache/*.log]
[batch://<path>] move_policy = sinkhole <attrbute1> = <val1> <attrbute2> = <val2> ...
Use batch to set up a one time, destructive input of data from a source. For continuous, non-destructive inputs, use monitor.
Note: You must set move_policy = sinkhole. This loads the file destructively. Do not use this input type for files you do not want to consume destructively.
host = <string>
host= is automatically prepended to the value when this shortcut is used.
index = <string>
index= is automatically prepended to the value when this shortcut is used.
main (or whatever you have set as your default index).
sourcetype = <string>
sourcetype= is automatically prepended to the value when this shortcut is used.
source = <string>
source= is automatically prepended to the value when this shortcut is used.
queue = <string> (parsingQueue, indexQueue, etc)
parsingQueue.
host_regex = <regular expression>
host= attribute if the regex fails to match.
host_segment = <integer>
host:: attribute if the value is not an integer, or is less than 1.
Note: source = <string> and <KEY> = <string> are not used by batch.
This example batch loads all files from the directory /system/flight815/.
[batch://system/flight815/*] move_policy = sinkhole