Documentation: 3.3.4
Print Version Contents
This page last updated: 09/24/08 01:09pm

Crawl

Use crawl to search your filesystem for new data sources to add to your index. Configure one or more types of crawlers in crawl.conf to define the type of data sources to include in or exclude from your results.

Configuration

Edit $SPLUNK_HOME/etc/system/local/crawl.conf to configure one or more crawlers that browse your data sources when you run the crawl command. Define each crawler by specifying values for each of the crawl attributes. Enable the crawler by adding it to crawlers_list.

Crawl logging

The crawl command produces a log of crawl activity that's stored in $SPLUNK_HOME/var/log/splunk/crawl.log. Set the logging level with the logging key in the [default] stanza of crawl.conf:

[default]
logging = <warn | error | info | debug>

Enable crawlers

Enable a crawler by listing the crawler specification stanza name in the crawlers_list key of the [crawlers] stanza.

Use a comma-separated list to specify multiple crawlers.

Enable crawlers that are defined in the stanzas: [file_crawler], [port_crawler], and [db_crawler].

[crawlers]
crawlers_list = file_crawler, port_crawler, db_crawler

Define crawlers

Define a crawler by adding a definition stanza in crawl.conf. Add additional crawler definitions by adding additional stanzas.

Example crawler stanzas in crawl.conf:

[Example_crawler_name]
....

[Another_crawler_name]
....

Add key/value pairs to crawler definition stanzas to set a crawler's behavior. The following keys are available for defining a file_crawler:

bad_directories_list Specify directories to exclude.
bad_extensions_list Specify file extensions to exclude.
bad_file_matches_list Specify a string, or a comma-separated list of strings that filenames must contain to be excluded. You can use wildcards (examples: foo*.*,foo*bar, *baz*).
packed_extensions_list Specify extensions of compressed files to include. Leave this empty if you don't want to add any zipped files.
collapse_threshold Specify the minimum number of files a source must have to be considered a directory.
days_sizek_pairs_list Specify a comma-separated list of age (days) and size (kb) pairs to constrain what files are crawled. For example: days_sizek_pairs_list = 7-0, 30-1000 tells Splunk to crawl only files last modified within 7 days and at least 0kb in size, or modified within the last 30 days and at least 1000kb in size.
big_dir_filecount Set the maximum number of files a directory can have in order to be crawled. crawl excludes directories that contain more than the maximum number you specify.
index Specify the name of the index to add crawled file and directory contents to.
max_badfiles_per_dir Specify how far to crawl into a directory for files. If Splunk crawls a directory and doesn't find valid files within the specified max_badfiles_per_dir, then Splunk excludes the directory.
root Specify directories for a crawler to crawl through.

Example

Here's an example crawler called simple_file_crawler may look like:

[simple_file_crawler]
bad_directories_list= bin, sbin, boot, mnt, proc, tmp, temp, home, mail, .thumbnails, cache, old
bad_extensions_list= mp3, mpg, jpeg, jpg,  m4, mcp, mid
bad_file_matches_list= *example*, *makefile, core.*
packed_extensions_list= gz, tgz, tar, zip
collapse_threshold= 10
days_sizek_pairs_list= 3-0,7-1000, 30-10000
big_dir_filecount= 100
index=main
max_badfiles_per_dir=100

Previous: Whitelist and blacklist rules    |    Next: Windows inputs

Comments

  1. @business34 : please contact support@splunk.com and let them know about the difficulties you're having. thank you!

  2. Yeah I just downloaded splunk and it said that it's just not ready for windows just yet.And I even tried to download the toolbar that's even letting me do that either.So that just like reversed itself.

Log in to comment.