This document last updated: 07/03/08 06:07pm

Print Admin Manual

How Splunk Works

Overview of Splunk

Splunk is search software for any type of data. Learn more about how Splunk works by reading through this introductory page. You'll find many links here for installing, configuring and customizing your Splunk installation.

Configuration options

Splunk has several options for configuration: a web interface (known as Splunk Web), a CLI and configuration files. Most of Splunk's configurations can be reached through the Admin section of Splunk Web. Some advanced settings are only available through configuration files.

Installation and upgrade

Installing Splunk is easy and fast. Here are instructions for installing, upgrading, or backing up an existing copy.

Data inputs

Splunk can receive data in a variety of ways. Each configuration change can be affected via:

Read on for a brief description of each input type.

Note: For a more in-depth description of inputs, read how input configuration works.

Windows

Splunk for Windows comes with its own set of configuration files for setting up Windows-specific inputs, including Windows registry and WMI. Read more about configuring Windows inputs.

Distributed data

Configure distributed inputs and outputs across your network. Send data between one Splunk instance and another, or third party software. For an overview on all the available configuration options, see How data distribution works.

Indexing

Splunk takes all data from inputs and sends it to an indexing pipeline. Data is then broken up into separate events via segmentation rules. Most data is segmented and timestamped correctly. However, you may wish to configure Splunk to index your data in particular ways. Learn more about how indexing works.

Here are some things you might want to consider:

Configuration for indexing is set mostly through props.conf and transforms.conf.

Fields

Fields are a useful aspect of Splunk's search interface. You can use Splunk's built-in fields that are enabled by default. Here's a list of Splunk's default fields, including links to more in-depth documentation:

You can also create your own fields. Custom fields are useful for:

To learn more about creating custom fields, see how fields work.

Search

Splunk's search interface is useful for tracking down different aspects of your data. Here are a few things you can do with your searches:

For a more detailed overview of search, see how search works.

Distributed search

In a distributed set up, you may want to search across multiple instances of Splunk. Enable distributed search to federate searches across your entire Splunk deployment. Read more about how distributed search works.

Security

Secure your Splunk server with the following security configuration options. Here's a brief overview of the available features. For a more detailed overview, see security options.

Authentication

Splunk includes several authentication options, including:

Audit

Use the following options to enable separate auditing configurations:

Data management

Splunk servers often index large amounts of data each day. You may want to enable advanced settings to handle the following data management scenarios.

Note: Many data management settings are enabled on a per-index basis, using indexes.conf. To learn more about indexes, see how indexes work.

Deployment server

In a distributed set up, enable one or more Splunk instances as deployment servers. A deployment server pushes out configuration changes to other Splunk instances.

For a complete overview of all deployment options, read the Deployment manual. For instructions on configuring and enabling the deployment server and clients, read the Admin manual section on the deployment server.

Performance tuning

The following options help you tune Splunk's performance for your environment. Depending on your system and requirements, you may want to change one or more of the following settings:

A more in-depth overview of performance tuning options is available here.

Configuration files

Many of Splunk's advanced configurations and customizations are available only through configuration files. Create configurations by copying files into a custom application directory. Learn more about application directories and configuring application directories.

Applications

Applications are directories of configuration files with specific purposes. For example, Splunk-2-Netcool. Configure your own applications by following these instructions.

You can also share your configuration file directories as applications with the Splunk community on SplunkBase.

Customization

Pimp your Splunk! Everybody's data is a little bit different. Maybe you want to set custom configurations for the system you're running Splunk on. Here are options for personalizing your Splunk instance.

Splunk Web appearance

Change various aspects of Splunk Web's appearance:

Extend Splunk

Splunk includes a REST API. Read the Developer's Guide to learn more about the REST API. To configure additional REST endpoints, use restmap.conf.

Troubleshooting

If there's something you need help with, even after reading the documentation, contact Splunk support.

If there's a feature you don't see here that you want included, file an enhancement request with Splunk support.

We're always interested in your feedback.

Getting Started

Start Splunk

This topic serves only as a brief instruction to starting Splunk. If you are new to Splunk, we recommending reviewing the User Manual first.

Before you start

Before starting Splunk, install the software. Refer to the Installation Manual for system requirements and step-by-step instructions. Make sure you install the correct version of Splunk and that you are installing on a supported filesystem.

Start Splunk on non-Windows platforms

Splunk's command line interface is located in $SPLUNK_HOME/bin/. $SPLUNK_HOME refers to the path you installed under. Navigate to this location and run the following command:

# ./splunk start

You must accept Splunk's EULA the first time you start Splunk after a new installation. To bypass this step, start Splunk and accept the license in one step:

# ./splunk start --accept-license

NOTE: There are two dashes before the accept-license option.

Start Splunk on Windows

On Windows, Splunk is installed by default into \Program Files\Splunk

Start and stop the following Splunk processes via the Windows Services Manager:

You can also start, stop, and restart both processes at once by going to \Program Files\Splunk\bin and typing
# splunk.exe [start|stop|restart]

Load Splunk Web in your browser

Navigate to:

http://mysplunkhost:8000

Use whatever host and port you chose during installation.

The first time you login to Splunk with an Enterprise license, use username admin and password changeme. Splunk with a free license does not have access controls.

Administration basics

The $SPLUNK_HOME variable refers to the top level directory of your installation. By default, this is /opt/splunk/.

Add Splunk to your shell path

To save a lot of typing, set a SPLUNK_HOME environment variable and add $SPLUNK_HOME/bin to your shell's path. The example below works for bash users who accepted the default installation location. Use the correct syntax and path for your own installation.

# export SPLUNK_HOME=/opt/splunk
# export PATH=$SPLUNK_HOME/bin:$PATH

Splunk's CLI

Splunk's command line interface is located in $SPLUNK_HOME/bin/. If you have exported the path and environment variables (as explained above), you can use the splunk command as follows:

# splunk [action] [object] [-parameter value] ....

If you haven't set an environment variable, navigate to $SPLUNK_HOME/bin/ and run commands as follows:

#./splunk [action] [object] [-parameter value] ....

For general help, type:

# splunk help

For a list of commands and options, type:

# splunk help commands

For Splunk with an Enterprise license, administration commands must be authenticated with a username and password. To authenticate for an entire session, type:

# splunk login

This command prompts you for a Splunk username and password. Use the same username and password for the CLI and Splunk Web. By default, the login is set to admin and the password is changeme.

Logout at any time by typing:

# splunk logout

To authenticate a single command, use the -auth parameter:

# splunk search foo -auth username:password

Note: the -auth string must be the last term in the CLI command.

Start/stop Splunk, check status

Ensure that you have added Splunk to your server host's path (as explained above, in "Adding Splunk to your shell path"). Otherwise you must use the ./splunk command.

Start the Server

From a shell prompt on the Splunk sever host, run this command:

# splunk start

Alternately, start either splunkd (to load back-end configuration) or Splunk Web (to load web configuration):

# splunk start splunkd

# splunk start splunkweb

Or restart Splunk (splunkd or Splunk Web) by running:

# splunk restart

# splunk restart splunkd

# splunk restart splunkweb

Stop the Server

To shut down Splunk, run this command:

# splunk stop

Also available for splunkd and Splunk Web:

# splunk stop splunkd

# splunk stop splunkweb

Check if Splunk is running

To check if Splunk is running, type this command at the shell prompt on the sever host:

# splunk status

You should see this output:

splunkd is running (PID: 3162).
splunk helpers are running (PIDs: 3164).
splunkweb is running (PID: 3216).

Or you can use ps to check for running Splunk processes:

# ps aux | grep splunk | grep -v grep

Solaris users, type -ef instead of aux:

# ps -ef | grep splunk | grep -v grep

Help

Help is available in several forms.

Help Options

Change defaults

Changing the admin default password

Splunk with an Enterprise license has a default administration account and password. It is highly recommended that you change the default. You can do this via Splunk's CLI or Splunk Web.

Note: CLI commands assume you have set a Splunk environment variable. If you have not, navigate to $SPLUNK_HOME/bin and run the ./splunk command.

via Splunk Web

http://www.splunk.com/assets/doc-images/30_admin1_changedefaults/adminbutton.jpg

http://www.splunk.com/assets/doc-images/30_admin1_changedefaults/users.jpg

via Splunk CLI

The Splunk CLI command is:

# splunk edit user

Note: You must authenticate with the existing password before it can be changed. Log into Splunk via the CLI or use the -auth parameter.

For example:

# splunk edit user admin -password foo -auth admin:changeme

This command changes the admin password from changeme to foo.

Changing network ports

Splunk uses two ports. They default to:

via Splunk Web

http://www.splunk.com/assets/doc-images/30_admin1_changedefaults/adminbutton.jpg

http://www.splunk.com/assets/doc-images/3_2admin1_changedefaults/ports.jpg

via Splunk CLI

To change the port settings via the Splunk CLI, use the CLI command set.

# splunk set web-port 9000

This command sets the Splunk Web port to 9000.

# splunk set splunkd-port 9089

This command sets the splunkd port to 9089.

Changing the default Splunk server name

The Splunk server name setting controls both the name displayed within Splunk Web and the name sent to other Splunk Servers in a distributed setting.

The default name is taken from either the DNS or IP address of the Splunk Server host.

via Splunk Web

http://www.splunk.com/assets/doc-images/30_admin1_changedefaults/adminbutton.jpg

http://www.splunk.com/assets/doc-images/3_2admin1_changedefaults/ports.jpg

via Splunk CLI

To change the server name via the CLI, type the following:

# splunk set servername foo

This command sets the servername to foo.

Changing the datastore location

The datastore is the top-level directory where the Splunk Server stores all indexed data, user accounts, and working files.

Note: If you change this directory, the server does not migrate old datastore files. Instead, it starts over again at the new location.

To migrate your data to another directory follow the instructions in Move an index.

via Splunk Web

http://www.splunk.com/assets/doc-images/30_admin1_changedefaults/adminbutton.jpg

http://www.splunk.com/assets/doc-images/3_2admin1_changedefaults/datastore.jpg

via Splunk CLI

To change the server name via the CLI, type the following:

# splunk set datastore-dir /var/splunk/

This command sets the datastore directory to /var/splunk/.

Set minimum free disk space

The minimum free disk space setting controls how low disk space in the datastore location can fall before Splunk stops indexing.

Splunk resumes indexing when more space becomes available. For detailed information on how to manage Splunk server disk usage, see Disk usage.

via Splunk Web

http://www.splunk.com/assets/doc-images/30_admin1_changedefaults/adminbutton.jpg

http://www.splunk.com/assets/doc-images/3_2admin1_changedefaults/datastore.jpg

via Splunk CLI

To change the server name via the CLI, type the following:

# splunk set minfreemb 2000

This command sets the minimum free space to 2000 MB.

Find and index data

There are several methods to get your data into Splunk. Add data via Splunk Web, Splunk's CLI, a configuration file, with scripts, or 3rd party software.

Here's a brief intro on getting data into Splunk. For more detailed instructions, follow any of the links above.

Add Data

When you first log in to Splunk Web, you're provided a link to begin monitoring /var/log locally.

There are many other ways to specify data inputs in Splunk. This section is a high-level description of these techniques. For more detailed methods, see the data inputs section.

Monitor a file

When you specify a file to monitor, Splunk processes the entire file and then watches the file and processes additions to it. When you give a directory name to process, Splunk recursively searches all subdirectories looking for files resembling log files. You can explicitly include or exclude files with whitelisting and blacklisting.

Monitor files via Splunk Web

Manage your indexed files and add new files to your index from the Admin > Data Inputs: Files & Directories page.

1. To access the Admin page, click the Admin link in the upper right-hand corner.
The Admin page opens to the Server settings page.

2. From the navigation links on the left, click Data Inputs.
The Admin > Data Inputs: All page opens.

3. From the navigation links on the left or the table of input types, click Files & Directories.
The Admin > Data Inputs: FIles & Directories page opens.

4. Click New Input.
The Admin > Data Inputs: Files & Directories: New Input opens.

Monitor files via the CLI

Use the splunk add command. These commands assume you have set a Splunk environment variable. If you have not, you must navigate to $SPLUNK_HOME/bin and run the ./splunk command.

For example:

splunk add monitor /var/log/

This command monitors all files in /var/log/.

Crawl for inputs

Splunk 3.3 introduces the new crawl feature. Crawl your file system for potential logs and data to index. Read more about crawl in the data inputs sections.

Add more users

There are three default user roles and three different authentication methods to choose from when you set up Splunk with an Enterprise license. Users authenticate with Splunk's built-in system (described below), LDAP or scripted authentication (for third-party auth systems). Either method works with Splunk's roles system.

You must be logged in as a Splunk administrator to add or edit user accounts. The default Admin account password is changeme.

Note: Splunk with a Free license does not contain access control features. To access this page, you must run Splunk with an Enterprise license. For more information, read About Splunk licenses.

Lost admin password

If you lose the password to your admin account, contact Splunk Support for assistance.

Splunk local users

A Splunk Admin can create new users either via Splunk Web or Splunk's CLI. Users can be mapped to Splunk's default roles or any custom roles via authorize.conf

via Splunk Web

via Splunk CLI

From the CLI, use the following commands to add, edit, remove or list users.

add user username [-parameter value] ...
edit user username [-parameter value]  ...
remove user username [-parameter value]  ...
list user username [-parameter value]  ... 

Required (Default) Parameter:

username -- the name of the Splunk user account to manage.
full-name -- real name of user in quotes, for example "Nikola Tesla" - required when adding a new user.

Optional Parameters:

full-name -- real name of user in quotes, for example "Nikola Tesla"
password -- the password to set for the account
role -- either user, power or admin

Example

This example assumes you have set a Splunk environment variable. If you have not, you must navigate to $SPLUNK_HOME/bin and run the ./splunk command.

# splunk edit user newbie -password f8h2.$R -auth admin:d3cidr

This example authenticates as user "admin" to change the password for user "newbie."

Note: You must be logged in as an Admin to make any changes regarding users. Login either via the splunk login command, or use -auth, as exemplified above.

Start searching

Now you're ready to start using Splunk's search capabilities. Here are a few pages to help you start searching:

  1. Search reference.
  2. Search syntax.
  3. Search tutorial.

Data Inputs

How input configuration works

Specify data inputs via Splunk's CLI or Splunk Web. You may also use inputs.conf (read more about how to configure inputs via inputs.conf). Changes made via Splunk Web or the Splunk CLI are written to $SPLUNK_HOME/etc/system/local/inputs.conf. Configure Windows inputs via inputs.conf as well.

Read on for a description of Splunk's data input types, including their purpose and behavior.

Files and directories

Data inputs can come from files and directories. Use monitor for continuous, non-destructive inputs from files and directories. Use batch input for one time, destructive file loading.

Monitor

Splunk's monitor behaves like the UNIX tail command. Specify a path to a file or directory and Splunk's monitor processor consumes any new input. If subdirectories exist within the specified directory, Splunk recursively examines them for log files. Splunk automatically adds any new files into the index.

In addition, when monitoring a file:

Note: If you are monitoring large files or archives, removing the input does not stop those files being indexed. This does stop files from being checked again, but all the initial content will be indexed. To stop all in-process data, you must restart the Splunk server.

When monitoring a directory:

Note: If the specified file or directory does not exist, the Splunk server will not check to see if it is created later. Splunk only checks for files and directories each time the Splunk server starts (or is restarted). So be sure to explicitly add new files as inputs when they become available if you don't want to restart the server. When monitoring a file, the entire path dir/filename must not exceed 1024 characters.

Batch upload

Upload files directly through Splunk Web. If necessary, Splunk unpacks and uncompresses files before indexing.

Use the batch processor at the CLI or in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. For continuous, non-destructive loading of files, use monitor.

FIFO queues

A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. When choosing the FIFO data input method, consider the following:

Note: FIFOs are not recommended for application servers forwarding data to Splunk in a distributed setting. Monitor is a more reliable, stable method.

Network ports

UDP and TCP ports can feed data into the Splunk Server. UDP and TCP behave differently, and these behaviors affect how data arrives for processing. When configuring network ports, keep in mind that you cannot use ports lower than 1024 if you have not installed Splunk as root.

UDP

UDP is a best effort protocol. This means that you might not get messages if the network is clogged, or has a hiccup. You also can't be absolutely sure the messages aren't spoofed or altered in transit. UDP should be reserved for logging implementations focused on day-to-day troubleshooting rather than compliance or security.

Splunk with an Enterprise license can read directly from the network on any UDP port. Use this configuration to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. You can also send any other UDP source of logging data, including SNMP.

Like all network streaming approaches, direct UDP input is higher performance than reading files from disk.

TCP

TCP is a reliable, high-performance choice for many situations, as this protocol includes checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other syslog implementations that use TCP for security or reliability. TCP is the foundation of Splunk's distributed data access.

Note: If the sending process buffers data such that events are broken into multiple pieces, Splunk may interpret the parts as multiple events. This is more likely if events are being generated intermittently, as there may be long pauses (several seconds or longer) between blocks of buffered data. If you notice truncated events, try forcing the process to send events atomically.

Scripted inputs

Configure Splunk to run shell commands on a schedule, and then index the output. For example:

See Configure scripted inputs for details on how to set this up.

Indexing properties

Splunk can process any data, regardless of format and it automatically learns event boundaries, classifies events and sources, and finds timestamps. However, sometimes you may want to customize Splunk's default processing. Change processing settings and indexing properties in props.conf.

Some attributes within props.conf can be customized by defining new stanzas in other configuration files. For example, transforms.conf defines regex-based rules for extracting fields, correlating events and performing other transformations. Segmenters.conf and outputs.conf can also define attribute values referenced by props.conf.

Common use cases for custom indexing properties include:

Configure inputs via Splunk Web

Follow these instructions to configure data inputs via Splunk Web. You can also configure data inputs via Splunk's CLI or a configuration file.

Configuration

Files and directories

FIFO queues

Network ports

With a Splunk Enterprise license, you can set input from any TCP or UDP port.

Configure inputs via the CLI

In addition to using Splunk Web or editing inputs.conf, you can also configure data inputs at Splunk's Command Line Interface (CLI).

To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command. You can also add Splunk to your path and use the splunk command.

Note: If you get stuck, Splunk's CLI has built-in help. Access the main CLI help page by typing splunk help. Individual commands, objects, and parameters have their own help pages as well -- type splunk help [command | object | parameter name].

Data input commands

Use Splunk CLI data commands to perform actions on data sources. Commands and data sources take various parameters depending on the combination you use. There are five different commands to configure data inputs in the CLI:

Command Command syntax Action
add add [monitor | fifo | tcp | udp] source [-parameter value] ... Add a specified data input to Splunk.
edit edit [monitor | fifo | tcp | udp] source [-parameter value] ... Edit a data input was previously added.
remove remove [monitor | fifo | tcp | udp] source Remove a previously added data input.
list list [monitor | fifo | tcp | udp] List the currently configured data inputs of a specified type.
spool spool source Copy a file into Splunk via the sinkhole directory.

Data input types

Specify a data input type to use with a data input command.

Data input type Definition
monitor Continuously monitor a file or directory for new input.
fifo A FIFO or named pipe to index from.
tcp A TCP socket (network input) to monitor.
udp A UDP socket (network input) to monitor.

Input type parameters

Change the configuration of each data input type by defining the parameters below. Optional parameters have the syntax: -parameter value. Use only one -hostname, -hostregex or -hostsegmentnum per command.

monitor

Required parameters

source Path to the file or directory to monitor for new input.

Optional parameters

sourcetype Specify a sourcetype field value for events from the input source.
index Specify the destination index for events from the input source.
hostname Specify a host name to set as the host field value for events from the input source.
hostregex Specify a regular expression on the source file path to set as the host field value for events from the input source.
hostsegmentnum Set the number of segments of the source file path to set as the host field value for events from the input source.
active-only (T | F) True or False. Set true to tell Splunk to only keep indexing files that have write-permissions enabled.
follow-only (T | F) True or False. Default False. When set to True, Splunk will read from the end of the source (like the "monitor -f" Unix command).

Example

monitor only writable files in /var/log/.

./splunk add monitor /var/log/
./splunk edit monitor /var/log -active-only true

FIFO

Required parameters

source Path to a FIFO or named pipe to index.

Optional parameters

sourcetype Specify a sourcetype field value for events from the input source.
index Specify the destination index for events from the input source.
hostname Specify a host name to set as the host field value for events from the input source.
hostregex Specify a regular expression on the source file path to set as the host field value for events from the input source.
hostsegmentnum Set the number of segments of the source file path to set as the host field value for events from the input source.

Example

Configure a FIFO input and set the host and sourcetype field values for each event that's indexed.

./splunk add fifo /var/run/syslogfifo -sourcetype linux_messages_syslog
./splunk edit fifo /var/run/syslogfifo -hostname web01

TCP/UDP

Required parameters

source Port number to listen for data to index.

Optional parameters

sourcetype Specify a sourcetype field value for events from the input source.
index Specify the destination index for events from the input source.
hostname Specify a host name to set as the host field value for events from the input source.
remotehost Specify an IP address to exclusively accept data from.
resolvehost Set True of False (T | F). Default is False. Set True to use DNS to set the host field value for events from the input source.

Example

Configure a network input and set the sourcetype field value for each event that's indexed.

./splunk add udp 514 -sourcetype syslog
./splunk edit udp 514 -resolvehost true -auth gwb:d3c1dr

Configure inputs via inputs.conf

Data inputs added via inputs.conf can be more detailed than inputs enabled via SplunkWeb or the CLI.

Note: To set dynamic indexing properties for inputs, use props.conf.

Configuration

Add your stanza to $SPLUNK_HOME/etc/system/local/inputs.conf. Specify an input type and any number of attribute/value pairs.

[<inputtype>://<path>]
attribute1 = val1
attribute2 = val2
...

Global settings

The following attributes/value pairs are valid for ALL input types

host = <string>

index = <string>

source = <string>

sourcetype = <string>

queue = <string> (parsingQueue, indexQueue, etc)

Input types

The following attributes/value pairs are valid for the specified input types only.

monitor

[monitor://<path>]

This directs Splunk to watch all files in the <path> (or just <path> itself if it represents a single file). You must specify the input type and then the path, so put three slashes in your path if you're starting at root. You can use wildcards for the path; see below.

Note: To ensure new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes when it changes. Note that the entire file is indexed, which can result in duplicate events.

wildcards

You can use wildcards to specify your input path for monitored input. Use ... for paths and * for files.

Note: In Windows, you must use two backslashes \\ to escape wildcards. Regexes with backslashes in them are not currently supported for _whitelist and _blacklist in Windows.

Specifying wildcards results in an implicit _whitelist created for that stanza. The longest fully qualified path is used as the monitor stanza, and the wildcards are translated into regular expressions using the following map:

wildcard regex meaning
* [^/]* anything but /
... .* anything (greedy)
. \. literal .

For example, if you specify

[monitor:///foo/bar*.log]

Splunk translates this into
[monitor:///foo/]
_whitelist = bar[^/]*\.log

As a consequence, you can't have multiple stanzas with wildcards for files in the same directory.

For example:

[monitor:///foo/bar_baz*]
[monitor:///foo/bar_qux*]

This results in overlapping stanzas indexing the directory /foo/. Splunk takes the first one, so only files starting with /foo/bar_baz will be indexed. To encompass both sources, manually specify a _whitelist using regular expression syntax for "or":
[monitor:///foo]
_whitelist = (bar_baz[^/]*|bar_qux[^/]*)

Note: To set any additional attributes (such as sourcetype) for multiple whitelisted/blacklisted inputs that may have different attributes, use props.conf

additional attributes

host_regex = <regular expression>

host_segment = <integer>

crcSalt = <string>

followTail = 0|1

_whitelist = <regular expression>

_blacklist = <regular expression>

Batch

[batch://<path>]
move_policy = sinkhole.

Additional attributes

host_regex (see monitor)
host_segment (see monitor)

Note: source = <string> and <KEY> = <string> are not used by batch.

TCP

[tcp://<remote server>:<port>]

Additional attributes

connection_host = [ip | dns]

UDP

[udp://:<port>]

Additional attributes

_rcvbuf = <int>

no_priority_stripping = <true/false>

FIFO

[fifo://<path>]

Scripted Input

[script://<cmd>]

interval = <integer>

passAuth = <username>

Examples

Monitor

[monitor:///apache/.../logs]

This loads anything in /apache/foo/logs or /apache/bar/logs, etc.

[monitor:///apache/*.log]

This loads anything in /apache/ that ends in .log.

Batch

[batch://system/flight815/*]
move_policy = passive_symlink

This example batch loads all files from the directory /system/flight815/.

TCP

[tcp://<remote server>:<port>]

This configures Splunk to listen on the specified port. If a connection is made from <remote server>, this stanza is used to configure the input.
If <remote server> is blank, this stanza matches all connections on the specified port.

UDP

[udp://<remote-server>:<port>]

Similar to TCP, except that Splunk listens on a UDP port.

FIFO

[fifo://<path>]

This directs Splunk to read from the FIFO at the specified path.

Configure inputs for Windows

You can configure the Windows version of Splunk to index your Windows Application, System, and Security event logs, as well as monitor and index changes to your registry and set up WMI data input. This functionality is not yet exposed in Splunk Web or the CLI.

When you run the Splunk Windows installer, you are given the option to set up indexing and/or monitoring for the event logs, the registry, and for WMI. If you choose to do this, the default values for these settings are assumed. Once you have completed the installation, you can then make changes to the default values set by the installation process. Depending on

If you want to make changes to the default values, edit a copy of inputs.conf in $SPLUNK_HOME\etc\system\local\. You only have to provide values for the parameters you want to change within the stanza. For more information about how to work with Splunk configuration files, refer to How do configuration files work?

Configure indexing for Windows event logs

The settings for which event logs to index are in the following stanza in inputs.conf:

# Windows platform specific input processor.
[WinEventLog:Application] 
[WinEventLog:Security]
[WinEventLog:System]

To disable indexing for an event log, use # to comment it out in this stanza in $SPLUNK_HOME\etc\system\local\inputs.conf.

Configure Windows registry monitoring input

The global settings for Windows registry monitoring are in the following stanza in inputs.conf:

[script://$SPLUNK_HOME\bin\scripts\splunk-regmon.py]
interval = 60
sourcetype = WinRegistry
source = WinRegistry
disabled = 0

Note: The Splunk registry input monitoring script (splunk-regmon.py) is configured as a scripted input. Do not change this value.

The Windows registry monitoring functionality uses two additional configuration files that are described in Windows registry input. You may wish to review this information before proceeding.

Note: You must use two backslashes \\ to escape wildcards in stanza names in inputs.conf. Regexes with backslashes in them are not currently supported when specifying paths to files.

WMI input

Splunk supports WMI (Windows Management Interface) data input for agentless access to Windows performance data and event logs. This means you can pull event logs from all the Windows servers and desktops in your environment without having to install anything on those machines.

The Splunk WMI data input can connect to multiple WMI providers and pull data from them. The WMI data input runs as a separate process (splunk-wmi.exe) on the Splunk server. It is configured as a scripted input in etc/system/default/inputs.conf.

Note: This feature is NOT enabled by default.

Security and remote access considerations

Splunk requires privileged access to index many Windows data sources, including WMI, Event Log, and the registry. This includes both the ability to connect to the box, as well as permissions to read the appropriate data once connected.

* There are several things to consider:

Test access to WMI

To access WMI data, Splunk must run as a user with permissions to perform remote WMI connections. This user name must be a member of an Active Directory domain and must have appropriate privileges to query WMI. Both the Splunk server making the query and the target systems being queried must be part of this Active Directory domain.
Note: If you installed Splunk as the LOCAL SYSTEM user, WMI remote authentication will not work; this user has null credentials and Windows servers normally disallow such connections.

The following steps explain how to test the configuration of the Splunk server and the :

1. Log into the machine Splunk runs on as the user Splunk runs as.
2. Click Start -> Run and type wbemtest. The wbemtest application starts.
3. Click Connect and type \\<server>\root\cimv2, replacing <server> with the name of the remote server. Click Connect. If you are unable to connect, there is a problem with the authentication between the machines.
4. If you are able to connect, click Query and type select * from win32_service. Click Apply. After a short wait, you should see a list of running services. If this does not work, then the authentication works, but the user Splunk is running as does not have enough privileges to run that operation.

Configure WMI input

Look in $SPLUNK_HOME/etc/system/default/wmi.conf to see the default values for the WMI input. If you want to make changes to the default values, edit a copy of wmi.conf in $SPLUNK_HOME/etc/system/local/. You only have to provide values for the parameters you want to change for a given type of data input.

Refer to How configuration files work for more information about how Splunk uses configuration files, but be sure to use the new directory structure for the correct directory paths.

[settings]
initial_backoff = 5
max_backoff = 20
max_retries_at_max_backoff = 2
result_queue_size = 1000
checkpoint_sync_interval = 2
heartbeat_interval = 500

[WMI:AppAndSys]
server = foo, bar
interval = 10
event_log_file = Application, System, Directory Service
disabled = 0

[WMI:LocalSplunkWmiProcess]
interval = 5
wql = select * from Win32_PerfFormattedData_PerfProc_Process where Name = "splunk-wmi"
disabled = 0

The [settings] stanza specifies runtime parameters. The entire stanza and every parameter within it are optional. If the stanza is missing, Splunk assumes system defaults.

You can specify two types of data input: event log, and raw WQL (WMI query language) The event log input stanza contains the event_log_file parameter, and the WQL input stanza contains wql.

The common parameters for both types are:

WQL-specific parameters:

Event log-specific parameter:
event_log_file: specify a comma-separated list of log files to poll in the event_log_file parameter. File names that include spaces are supported, as shown in the example.

Source and source type for WMI data

All events are indexed in Splunk with a source of wmi.

The host is identified automatically from the data received.

Windows registry input

Splunk supports the capture of Windows registry settings and lets you monitor changes to the registry. You can know when registry entries are added, updated, and deleted.
When a registry entry is changed, Splunk captures the name of the process that made the change and the key path from the hive to the entry being changed.

The Windows registry input monitor application runs as a process called splunk-regmon.exe.

Warning: Do not stop or kill the splunk-regmon.exe process manually; this could result in system instability. To stop the process, stop the Splunk server process from the Windows Task Manager or from within Splunk Web.

How it works

Because it's possible for Windows registries to be extremely dynamic (thereby generating a great many events), Splunk provides a two-tiered configuration for fine-tuning the filters that are applied to the registry event data coming into Splunk.

Splunk Windows registry monitoring uses two configuration files to determine what to monitor on your system, sysmon.conf and regmon-filters.conf, both located in $SPLUNK_HOME/etc/system/local/. These configuration files work as a hierarchy:

sysmon.conf contains only one stanza, where you specify:

Each stanza in regmon-filters.conf represents a particular filter whose definition includes:

Get a baseline snapshot

When you install Splunk, you're given the option of recording a baseline snapshot of your registry hives the next time Splunk starts. By default, the snapshot covers the entirety of the user keys and machine keys hives. It also establishes a timeline for when to retake the snapshot; by default, if Splunk has been down for more than 24 hours since the last checkpoint, it will retake the baseline snapshot. You can customize this value for each of the filters in regmon-filters.conf by setting the value of baseline interval.

Note: Executing a splunk clean all -f deletes the current baseline snapshot.

What to consider

When you install Splunk on a Windows machine and enable registry monitoring, you specify which major hive paths to monitor: key users (HKEY) and/or key local machine (HKLM). Depending on how dynamic you expect the registry to be on this machine, checking both could result in a great deal of data for Splunk to monitor. If you're expecting a lot of registry events, you may want to specify some filters in regmon-filters.conf to narrow the scope of your monitoring immediately after you install Splunk and enable registry event monitoring but before you start Splunk up.

Similarly, you have the option of capturing a baseline snapshot of the current state of your Windows registry when you first start Splunk, and again every time a specified amount of time has passed. The baselining process can be somewhat processor-intensive, and may take several minutes. You can postpone taking a baseline snapshot until you've edited regmon-filters.conf and narrowed the scope of the registry entries to those you specifically want Splunk to monitor.

Configure Windows registry input

Look in $SPLUNK_HOME/etc/system/default/inputs.conf to see the default values for Windows registry input. They are also shown below.
If you want to make changes to the default values, edit a copy of inputs.conf in $SPLUNK_HOME/etc/system/local/. You only have to provide values for the parameters you want to change within the stanza. For more information about how to work with Splunk configuration files, refer to How do configuration files work?

[script://$SPLUNK_HOME\bin\scripts\splunk-regmon.py]
interval = 60
sourcetype = WinRegistry
source = WinRegistry
disabled = 0

Configure crawl

Use crawl to search your filesystem for new data sources to add to your index. Configure one or more types of crawlers in crawl.conf to define the type of data sources to include in or exclude from your results.

Configuration

Edit crawl.conf to configure one or more crawlers that browse your data sources when you run the crawl command. Define each crawler by specifying values for each of the crawl options. Enable the crawler by adding it to crawlers_list.

Crawl logging

The crawl command produces a log of crawl activity that's stored in $SPLUNK_HOME/var/log/splunk/crawl.log. Set the logging level with the logging key in the [default] stanza.

Example:
Set the logging level of crawl to warn.

[default]
logging=warn

Enable crawlers

Enable a crawler by listing the crawler specification stanza name in the crawlers_list key of the [crawlers] stanza.

Use a comma-separated list to specify multiple crawlers.

Example:
Enable crawlers that are defined in the stanzas: [file_crawler], [port_crawler], and [db_crawler].

[crawlers]
crawlers_list= file_crawler, port_crawler, db_crawler

Define crawlers

Define a crawler by adding a definition stanza in crawl.conf. You can add additional crawler definitions by adding additional stanzas.

Example crawler stanzas in crawl.conf:

[Example_crawler_name]
....

[Another_crawler_name]
....

Add key/value pairs to crawler definition stanzas to set a crawler's behavior. The following keys are available for defining a file_crawler:

bad_directories_list= Specify directories to exclude.
bad_extensions_list= Specify file extensions to exclude.
bad_file_matches_list= Specify a string, or a comma-separated list of strings that filenames must contain to be excluded. You can use wildcards (examples: foo*.*,foo*bar, *baz*).
packed_extensions_list= Specify extensions of compressed files to exclude.
collapse_threshold= Specify the minimum number of files a source must have to be considered a directory.
days_sizek_pairs_list= Specify a comma-separated list of age (days) and size (kb) pairs to constrain what files are crawled. For example: days_sizek_pairs_list = 7-0, 30-1000 tells Splunk to crawl only files last modified within 7 days and at least 0kb in size, or modified within the last 30 days and at least 1000kb in size.
big_dir_filecount= Set the maximum number of files a directory can have in order to be crawled. crawl excludes directories that contain more than the maximum number you specify.
index= main Specify the name of the index to add crawled file and directory contents to.
max_badfiles_per_dir= Specify how far to crawl into a directory for files. If Splunk crawls a directory and doesn't find valid files within the specified max_badfiles_per_dir, then Splunk excludes the directory.

Example

A simple file_crawler may look like:

[simple_file_crawler]
bad_directories_list= bin, sbin, boot, mnt, proc, tmp, temp, home, mail, .thumbnails, cache, old
bad_extensions_list= mp3, mpg, jpeg, jpg,  m4, mcp, mid
bad_file_matches_list= *example*, *makefile, core.*
packed_extensions_list= gz, tgz, tar, zip
collapse_threshold= 10
days_sizek_pairs_list= 3-0,7-1000, 30-10000
big_dir_filecount= 100
index=main
max_badfiles_per_dir=100

Scripted inputs

By configuring inputs.conf, Splunk can also accept events from scripts. Scripted input is useful for command-line tools, such as vmstat, iostat, netstat, top, etc.

Note: Currently, scripted inputs do not get bundled in the deployment server. In the future, Splunk will support this behavior. For now, use your preferred configuration automation tool to push your script directory to your server classes.

Configuration

Note: Your script must be in the bin/ directory underneath your scripts/ directory.

[script://$SCRIPT] 
interval = X 
index = {main, $YOUR_INDEX}
sourcetype = {iostat, vmstat, etc}  OPTIONAL
source = {iostat, vmstat, etc} OPTIONAL
disabled = false

Variables

Example

This example shows the use of the UNIX top command as a data input source.

$ mkdir $SPLUNK_HOME/etc/apps/scripts
$ #!/bin/sh
 top -bn 1  # linux only - different OSes have different paramaters
chmod +x $SPLUNK_HOME/etc/apps/scripts/bin/top.sh
$SPLUNK_HOME/etc/apps/scripts/bin/top.sh
[script:///opt/splunk/etc/apps/scripts/bin/top.sh]
interval = 5                # run every 5 seconds
sourcetype = top        # set sourcetype to top
source = script://./bin/top.sh   # set source to name of script

Note:

[top]
BREAK_ONLY_BEFORE = GobblyGook
DATETIME_CONFIG = CURRENT

Configure whitelist and blacklist rules

When specifying inputs to monitor in inputs.conf, you can use whitelist and blacklist rules to explicitly tell Splunk to consume ONLY certain files or consume everything EXCEPT certain files. When you define a whitelist, Splunk indexes ONLY the files in that list. Alternately, when you define a blacklist, Splunk ignores the files in that list and consumes everything else. These settings are independent of each other.

Whitelist and blacklist rules use regular expression syntax to define the match on the file name. Also, your rules must be contained within a configuration stanza, for example [monitor://<path>]); those outside a stanza (global entries) are ignored.

Important: Define whitelist and blacklist entries with exact regex syntax; the "..." wildcard is not supported.

Whitelist (allow) files

To define the files you want Splunk to exclusively index, add the following line to your monitor stanza in $SPLUNK_HOME/etc/system/local/inputs.conf:

_whitelist = $YOUR_CUSTOM_REGEX

For example, if you want Splunk to monitor only files with the .log extension:

[monitor:///mnt/logs]
    _whitelist = .*\.log

Blacklist (ignore) files

To define the files you want Splunk to exclude from indexing, add the following line to your monitor stanza in $SPLUNK_HOME/etc/system/local/inputs.conf:

_blacklist = $YOUR_CUSTOM_REGEX

For example, if you want Splunk to ignore and not monitor only files with the .txt extension:

[monitor:///mnt/logs]
    _blacklist = .*\.txt

If you want Splunk to ignore and not monitor all files with either the .txt extension or the .gz extension:

[monitor:///mnt/logs]
    _blacklist = \.(txt|gz)$

Verify your lists

To verify that your whitelist and blacklist rules are configured properly, run the listtails utility found in your $SPLUNK_HOME/bin directory. listtails reads in the configuration of inputs.conf in all application directories, scans the directories and shows you the exact list of files that Splunk will monitor when you restart.

Note: The listtails utility requires you to first run the command source setSplunkEnv.

Log file rotation

Splunk recognizes when a file that it is monitoring (such as /var/log/messages) has been rolled (/var/log/messages1) and will not read the rolled file in a second time.

Note: Splunk does not recognize tar or gzip files produced by logrotate. You can explicitly set blacklist rules for .tar or .gz to prevent Splunk from reading these files as new logfiles, or you can configure logrotate to move these files into a directory you have not told Splunk to read.

How log rotation works

The monitoring processor picks up new files and reads the first and last 256 bytes of the file. This data is hashed into a begin and end cyclic redundancy check (CRC). Splunk checks new CRCs against a database that contains all the CRCs of files Splunk has seen before. The location Splunk last read in the file is also stored.

There are three possible outcomes of a CRC check:

1. There is no begin and end CRC matching this file in the database. This is a new file and will be picked up and consumed from the start. Splunk updates the database with new CRCs and seekptrs as the file is being consumed.

2. The begin CRC is present and the end CRC are present but the size of the file is larger than the seekPtr Splunk stored. This means that, while Splunk has seen the file before, there has been information added to it since it was last read. Splunk opens the file and seeks to the previous end of the file and starts reading from there (so Splunk will only grab the new data and not anything it has read before).

3. The begin CRC is present but the end CRC does not match. This means the file has been changed since Splunk last read it and some of the portions it has read in already are different. In this case there is evidence that the previous data Splunk read from has been changed. In this case Splunk has no choice but to read the whole file again.

Strip syslog headers before processing

Remove syslog headers from non-syslog events that have been passed through syslog to Splunk, such as log4j events from a log4j-to-syslog appender. Splunk ships with a regex to do this for you in $SPLUNK_HOME/etc/system/default/transforms.conf. Overwrite or change any of the default attributes and values by creating a transforms.conf in $SPLUNK_HOME/etc/system/local/ or your own custom bundle directory. For more information on configuration files in general, see how configuration files work.

Configuration

transforms.conf

In $SPLUNK_HOME/etc/system/default/transforms.conf:

# This will strip out date stamp, host, process with pid and just get the
# actual message
[syslog-header-stripper-ts-host-proc]
REGEX         = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s.*?:\s(.*)$
FORMAT        = $1
DEST_KEY      = _raw

Additional strippers found in this file include:

props.conf

In $SPLUNK_HOME/etc/sstem/local/props.conf:

[syslog]
TRANSFORMS= syslog-header-stripper-ts-host-proc

This example turns on the built-in regex for remote syslog inputs.

[syslog]
TRANSFORMS-strip-syslog= syslog-header-stripper-ts-host-proc

Add a name onto the TRANSFORMS declarations. There are no special keywords. TRANSFORMS-the-cake-is-a-lie works just as well.

Example

If you have a central syslog server (syslog1.idkfa.kom) receiving events from multiple servers, you can forward the events to a Splunk Server and index them based on the original host (doom1.idkfa.kom) and original timestamp (07:37:15). For this example the events come to Splunk via UDP port 514 and look like this:

Mar 30 14:29:35 syslog1.idkfa.kom Mar 30 07:37:15 doom1.idkfa.kom sshd[7728]: Connection closed by ::ffff:192.168.1.101

Create this configuration stanza in props.conf:

[syslog]
TIME_PREFIX = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s
TRANSFORMS-strip-syslog= syslog-header-stripper-ts-host

Determine what files Splunk is monitoring

When you configure inputs, you may want to know what specific files Splunk is monitoring prior to starting Splunk for indexing. This is especially true when configuring whitelisting/blacklisting rules. Splunk includes a listtails utility which reads in the configuration of inputs.conf in all applications, scans your directories and shows you the exact list of files what Splunk will monitor when you restart. This allows you to make changes to inputs.conf and verify if the blacklist/whitelist filtering is correct.

Run listtails

To use the listtails utility:
1. Navigate to $SPLUNK_HOME/bin/.
2. Run the command ./splunk cmd listtails.

Index SNMP events with Splunk

The most effective way to index SNMP events is to use snmptrapd to write them to a FIFO.

First, configure snmptrapd to write to a FIFO rather than to a file on disk.

# mkfifo /var/run/snmp-fifo
# snmptrapd -o /var/run/snmp-fifo

Then, configure the Splunk Server to add the FIFO as a data input.

log4j

The best way to index log4j files is to set up a standard log4j-syslog appender on your log4j host. Then configure the Splunk server's properties to strip the syslog header prior to other processing, so Splunk doesn't think the logs are single-line syslog entries.

See the entry on stripping syslog headers for instructions on stripping the syslog headers.

Data Distribution

How data distribution works

Splunk servers running on any supported OS platform can forward data to one another (as well as to other systems) in real time. This setup allows data inputs gathered on one Splunk server in a specific environment to be sent to another Splunk server for indexing and search. Also, Splunk servers can forward data to groups of other Splunk servers, to enable horizontal scaling via clustered indexing. Splunk servers can also clone data to multiple groups of other Splunk servers to provide for data redundancy in high availability environments.

Data distribution covers all configurations in which one Splunk server (the forwarder) is sending data to one or more Splunk servers (the receivers) prior to being indexed. The forwarder can also index data locally.

Note: All Splunk instances in a distributed cluster must be running the same version of Splunk, although they can be running on any variety of support OSes. Each receiving Splunk server must have a unique, valid Splunk Enterprise license.

Forwarding

Forwarding is the simplest setup for forwarding and receiving. Forwarding refers to any server that sends data to another server for indexing.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/dataforward.jpg

Learn how to enable forwarding and receiving.

Routing

With routing enabled, the forwarder matches conditions based on patterns in the events themselves to selectively send some events to one other server and other events to another server.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/datarouting.jpg

Learn how to enable data routing.

Cloning

Cloning refers specifically to a forwarder sending every event to two or more other Splunk servers to provide for data redundancy.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/datacloning.jpg

Learn how to enable cloning.

Data balancing

Data balancing refers to data that is sent in a balanced fashion to groups of servers. This set up supports large volumes of data. All of the forwarders send data to some number of receivers, and the receivers index data in a round-robin fashion.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/balance.jpg

Data balanced target groups are made up of multiple servers. Learn how to set up data balancing.

Buffering during data balancing

If a server becomes inaccessible during data balancing, Splunk continues to send events to all accessible servers.

Eventually, Splunk stops trying to send to an unresponsive server, and notes that the server has gone off line. If all servers are inaccessible, Splunk writes to a buffer on the forwarder's side.

Data buffering values are set in outputs.conf on the forwarding side.

Target groups

Rather than output data to one receiver, forwarders can send to target groups. Target groups are made of one or more receiving servers:

[target group 1] 
server 1, server 2

[target group 2]
server 3

[target group 3]
server 4, server 5, server 6

Cloning sends every event to all target groups.

Routing sends specific events to one target group and different events to other target groups.

You can also set up default groups, which receive all the data not sent to target groups. If more than one group is specified, Splunk clones events to all listed default groups.

defaultGroup=<groupname1>,<groupname2>...

Learn more about target group configuration.

Security

Any Splunk server can route some or all of its incoming data in real time to other Splunk servers and to other systems via TCP, either in the clear text or via SSL. Learn how to set up SSL.

Send to 3rd party systems

By default, data is routed between Splunk servers as cooked data -- meaning events have been parsed and tagged. However, Splunk can be configured to either receive or send raw data in order to interact with third party systems.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/thirdparty.jpg

Learn how to configure Splunk to send to or receive from third party software.

Distributed search

Splunk servers can distribute search requests to other Splunk servers and merge the results back to the user. Distributed search combines with balanced indexing to provide horizontal scaling, allowing you to search and index hundreds of gigabytes or terabytes per day. Additionally, distributed search allows select users to correlate data across different data silos.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/dsearch.jpg

Learn more about distributed search.

Configuration files for data distribution

Enable forwarding and receiving

Set up forwarding and receiving via Splunk Web or Splunk's CLI. To set up more sophisticated forwarding configurations, see this page on configuring outputs.conf.

You can set up two types of forwarders: standard and lightweight. If you configure a standard forwarder, it indexes the data before forwarding it to the receiving Splunk host. When you configure a lightweight forwarder, it sends un-indexed data to the receiving Splunk host. If you are using both types of forwarders, you must specify a different port for each type.

You must set up receiving before setting up forwarding. This way, the Splunk receiving host is prepared for the forwarded data.

Once you have enabled a Splunk instance to forward or receive data, you can configure additional settings, such as routing, cloning, filtering or data balancing. Configuration changes are done on the forwarder side, on the host that is reading the data input.

Note: An Enterprise license is required on each receiver node. Splunk instances that are forwarding can continue to use the free license. For customers with a valid support agreement that require authentication for all Splunk instances please contact support and request a forwarder license. This special forwarder license can be re-used on all forwarding instances.

Receiving

via Splunk Web

via the CLI

Enable receiving from Splunk's CLI. To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command. Also, add Splunk to your path and use