Topics

| pdf version

About the Splunk Admin Manual

How Splunk Works


Splunk > The IT Search Company

  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk

Localized Splunk documentation

Looking for Splunk documentation in other languages?

Files and directories

This documentation does not apply to the most recent version of Splunk.

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13

Files and directories

Point Splunk at a file or a directory. If you specify a directory, Splunk consumes everything in the directory. Splunk has two different file input processors: monitor and batch. For the most part, use monitor to input all your data sources from files and directories. The only time you should use batch is to load a large archive of historical files. Read on for more specifics.

Monitor

Specify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, including network filesystems, as long as the Splunk server can read from the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files.

Splunk checks for the file or directory specified in a monitor configuration on Splunk server start and restart. If the file or directory specified is not present on start, Splunk checks for it again in 24 intervals from the time of the last restart. Subdirectories of monitored directories are scanned continuously. To add new inputs without restarting Splunk, use Splunk Web or the command line interface. If you want Splunk to find potential new inputs automatically, use crawl.

When using monitor:

  • On most operating systems, files can be opened or closed for writing. With the exception of Windows, Splunk consumes files even if they're still being written to by the operating system.
  • Files or directories can be included or excluded via whitelists and blacklists.
  • Upon restart, Splunk continues processing files where it left off.
  • Splunk decompresses archive files before it indexes them. It can handle the following common archive file types: .tar, .gz, .bz2, .tar.bz2 , and .zip.
  • Splunk detects log file rotation and does not process renamed files it has already indexed (with the exception of .tar and .gz archives; for more information see "Log file rotation" in this manual).
  • The entire dir/filename path must not exceed 1024 characters.
  • Set the sourcetype for directories to Automatic. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory.
  • Removing an input does not stop the the input's files from being indexed. Rather, it stops files from being checked again, but all the initial content will be indexed. To stop all in-process data, you must restart the Splunk server.

Note: You cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.

Note: Monitor input stanzas may not overlap. That is, monitoring /a/path while also monitoring /a/path/subdir will produce unreliable results. Similarly, monitor input stanzas which watch the same directory with different whitelists, blacklists, and wildcard components are not supported.

Batch

Use the batch processor at the CLI or in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and then deletes it.

Note: Batch is most useful for loading in historical data, such as large archives of files. For best practices on loading file archives, see "How to index different sized archives".


Splunk Web

Add inputs from files and directories via Splunk Web.

1. Click Admin in the upper right-hand corner of Splunk Web.

2. Then click Data Inputs.

3. Pick files and directories.

4. Click New Input to add an input.

5. Under Data access, pick Monitor a directory.

You can also:

  • Upload a local file from your local machine into Splunk.
  • Index a file on the Splunk server, which copies a file on the server into Splunk via the batch directory.

6. Specify the pathname to the file or directory. If you select Upload, use the Browse... button.

To monitor a shared network drive, enter the following: <myhost><mypath> (or \\<myhost>\<mypath> on Windows). Make sure your Splunk server has read access to the mounted drive as well as the files you wish to monitor.

7. Under the Host heading, select the host name. You have several choices if you are using Monitor or Batch methods. Learn more about setting host value.

Note: Host only sets the host field in Splunk. It does not direct Splunk to look on a specific host on your network.

8. Now set the Source Type. Source type is a default field added to events. Source type is used to determine processing characteristics such as timestamps and event boundaries. Learn more about source type.

9. After specifying the source, host, and source type, click Submit.

CLI

Monitor files and directories via Splunk's Command Line Interface (CLI). To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command from the UNIX or Windows command prompt. Or add Splunk to your path and use the splunk command.

If you get stuck, Splunk's CLI has built-in help. Access the main CLI help by typing splunk help. Individual commands have their own help pages as well -- type splunk help <command>.

The following commands are available for input configuration via the CLI:

Command Command syntax Action
add add monitor $SOURCE [-parameter value] ... Add inputs from $SOURCE.
edit edit monitor $SOURCE [-parameter value] ... Edit a previously added input for $SOURCE.
remove remove monitor $SOURCE Remove a previously added $SOURCE.
list list monitor List the currently configured monitor.
spool spool source Copy a file into Splunk via the sinkhole directory.

Change the configuration of each data input type by setting additional parameters. Parameters are set via the syntax: -parameter value.

Note: You can only set one -hostname, -hostregex or -hostsegmentnum per command.

Parameter Required? Description
source Required Path to the file or directory to monitor for new input.
sourcetype Optional Specify a sourcetype field value for events from the input source.
index Optional Specify the destination index for events from the input source.
hostname Optional Specify a host name to set as the host field value for events from the input source.
hostregex Optional Specify a regular expression on the source file path to set as the host field value for events from the input source.
hostsegmentnum Optional Set the number of segments of the source file path to set as the host field value for events from the input source.
follow-only Optional (T/F) True or False. Default False. When set to True, Splunk will read from the end of the source (like the "tail -f" Unix command).

Example: use the CLI to monitor /var/log/

The following example shows how to monitor files in /var/log/:

Add /var/log/ as a data input:

./splunk add monitor /var/log/

Example: use the CLI to monitor windowsupdate.log

The following example shows how to monitor the Windows Update log (where Windows logs automatic updates):

Add C:\Windows\windowsupdate.log as a data input:

./splunk add monitor C:\Windows\windowsupdate.log

Example: use the CLI to monitor IIS logging

This example shows how to monitor the default location for Windows IIS logging: Add C:\windows\system32\LogFiles\W3SVC as a data input:

./splunk add monitor c:\windows\system32\LogFiles\W3SVC 

Inputs.conf

To add an input, add a stanza for it to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read how configuration files work before you begin.

You can set any number of attributes and values following an input type. If you do not specify a value for one or more attributes, Splunk uses the defaults that are preset in $SPLUNK_HOME/etc/system/default/ (noted below).

Monitor

[monitor://<path>]
<attrbute1> = <val1>
<attrbute2> = <val2>
...

This type of input stanza (monitor) directs Splunk to watch all files in the <path> (or just <path> itself if it represents a single file). You must specify the input type and then the path, so put three slashes in your path if you're starting at root. You can use wildcards for the path. For more information, see the "Wildcards" subsection, below.

Note: To ensure new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes when it changes. Note that the entire file is indexed, which can result in duplicate events.

host = <string>

  • Set the host value of your input to a static value.
  • host= is automatically prepended to the value when this shortcut is used.
  • Defaults to the IP address of fully qualified domain name of the host where the data originated.
  • For more information about the host field, see "How host works," in this manual.

index = <string>

  • Set the index where events from this input will be stored.
  • index= is automatically prepended to the value when this shortcut is used.
  • Defaults to main (or whatever you have set as your default index).
  • For more information about the index field, see "Splunk data management," in this manual.

sourcetype = <string>

  • Set the sourcetype name of events from this input.
  • sourcetype= is automatically prepended to the value when this shortcut is used.
  • Splunk automatically picks a source type based on various aspects of your data. There is no hard-coded default.
  • For more information about the sourcetype field, see the "How source types work," in this manual.

source = <string>

  • Set the source name of events from this input.
  • Defaults to the file path.
  • source= is automatically prepended to the value when this shortcut is used.

queue = <string> (parsingQueue, indexQueue, etc)

  • Specify where the input processor should deposit the events that it reads.
  • Can be any valid, existing queue in the pipeline.
  • Defaults to parsingQueue.

host_regex = <regular expression>

  • If specified, the regex extracts host from the filename of each input.
  • Specifically, the first group of the regex is used as the host.
  • Defaults to the default host= attribute if the regex fails to match.

host_segment = <integer>

  • If specified, the '/' separated segment of the path is set as host.
  • Defaults to the default host:: attribute if the value is not an integer, or is less than 1.

crcSalt = <string>

  • If set, this string is added to the CRC.
  • Use this setting to force Splunk to consume files that have matching CRCs.
  • If set to crcSalt = <SOURCE> (note: This setting is case sensitive), then the full source path is added to the CRC.

followTail = 0|1

  • If set to 1, monitoring begins at the end of the file (like tail -f).
  • This only applies to files the first time they are picked up.
  • After that, Splunk's internal file position records keep track of the file.

_whitelist = <regular expression>

  • If set, files from this path are monitored only if they match the specified regex.

_blacklist = <regular expression>

  • If set, files from this path are NOT monitored if they match the specified regex.

Wildcards

You can use wildcards to specify your input path for monitored input. Use ... for paths and * for files.

  • ... recurses through directories until the match is met. This means that /foo/.../bar will match foo/bar, foo/1/bar, foo/1/2/bar, etc. but only if bar is a file.
    • To recurse through a subdirectory, use another .... For example /foo/.../bar/....
  • * matches anything in that specific path segment. It cannot be used inside of a directory path; it must be used in the last segment of the path. For example /foo/*.log matches /foo/bar.log but not /foo/bar.txt or /foo/bar/test.log.
  • Combine * and ... for more specific matches:
    • foo/.../bar/* matches any file in the bar directory within the specified path.

Note: In Windows, you must use two backslashes \\ to escape wildcards. Regexes with backslashes in them are not currently supported for _whitelist and _blacklist in Windows.

Specifying wildcards results in an implicit _whitelist created for that stanza. The longest fully qualified path is used as the monitor stanza, and the wildcards are translated into regular expressions using the following map:


wildcard regex meaning
* [^/]* anything but /
... .* anything (greedy)
. \. literal .

Additionally, the converted expression is anchored to the right end of the file path, so that the entire path must be matched.

For example, if you specify

[monitor:///foo/bar*.log]

Splunk translates this into

[monitor:///foo/]
_whitelist = bar[^/]*\.log$

As a consequence, you can't have multiple stanzas with wildcards for files in the same director.

Also, you cannot use a _whitelist declaration in conjunction with wildcards.

For example:

[monitor:///foo/bar_baz*]
[monitor:///foo/bar_qux*]

This results in overlapping stanzas indexing the directory /foo/. Splunk takes the first one, so only files starting with /foo/bar_baz will be indexed. To include both sources, manually specify a _whitelist using regular expression syntax for "or":

[monitor:///foo]
_whitelist = (bar_baz[^/]*|bar_qux[^/]*)$

Note: To set any additional attributes (such as sourcetype) for multiple whitelisted/blacklisted inputs that may have different attributes, use props.conf.

Examples

To load anything in /apache/foo/logs or /apache/bar/logs, etc.

[monitor:///apache/.../logs]

To load anything in /apache/ that ends in .log.

[monitor:///apache/*.log]

Batch

[batch://<path>]
move_policy = sinkhole
<attrbute1> = <val1>
<attrbute2> = <val2>
...

Use batch to set up a one time, destructive input of data from a source. For continuous, non-destructive inputs, use monitor.

Note: You must set move_policy = sinkhole. This loads the file destructively. Do not use this input type for files you do not want to consume destructively.

host = <string>

  • Set the host value of your input to a static value.
  • host= is automatically prepended to the value when this shortcut is used.
  • Defaults to the IP address of fully qualified domain name of the host where the data originated.
  • For more information about the host field, see the host section.

index = <string>

  • Set the index where events from this input will be stored.
  • index= is automatically prepended to the value when this shortcut is used.
  • Defaults to main (or whatever you have set as your default index).
  • For more information about the index field, see the data management section.

sourcetype = <string>

  • Set the sourcetype name of events from this input.
  • sourcetype= is automatically prepended to the value when this shortcut is used.
  • Splunk automatically picks a source type based on various aspects of your data. There is no hard-coded default.
  • For more information about the sourcetype field, see the source type section.

source = <string>

  • Set the source name of events from this input.
  • Defaults to the file path.
  • source= is automatically prepended to the value when this shortcut is used.

queue = <string> (parsingQueue, indexQueue, etc)

  • Specify where the input processor should deposit the events that it reads.
  • Can be any valid, existing queue in the pipeline.
  • Defaults to parsingQueue.

host_regex = <regular expression>

  • If specified, the regex extracts host from the filename of each input.
  • Specifically, the first group of the regex is used as the host.
  • Defaults to the default host= attribute if the regex fails to match.

host_segment = <integer>

  • If specified, the '/' separated segment of the path is set as host.
  • Defaults to the default host:: attribute if the value is not an integer, or is less than 1.

Note: source = <string> and <KEY> = <string> are not used by batch.

Example

This example batch loads all files from the directory /system/flight815/.

[batch://system/flight815/*]
move_policy = sinkhole
Revision: 207 Contact Privacy Policy Terms of Use Community content licensed under Creative Commons