Use crawl to search your filesystem for new data sources to add to your index. Configure one or more types of crawlers in crawl.conf to define the type of data sources to include in or exclude from your results.
ConfigurationEdit $SPLUNK_HOME/etc/system/local/crawl.conf to configure one or more crawlers that browse your data sources when you run the crawl command. Define each crawler by specifying values for each of the crawl attributes. Enable the crawler by adding it to crawlers_list.
Crawl loggingThe crawl command produces a log of crawl activity that's stored in $SPLUNK_HOME/var/log/splunk/crawl.log. Set the logging level with the logging key in the [default] stanza of crawl.conf:
[default] logging = <warn | error | info | debug>
Enable a crawler by listing the crawler specification stanza name in the crawlers_list key of the [crawlers] stanza.
Use a comma-separated list to specify multiple crawlers.
Enable crawlers that are defined in the stanzas: [file_crawler], [port_crawler], and [db_crawler].
[crawlers] crawlers_list = file_crawler, port_crawler, db_crawler
Define a crawler by adding a definition stanza in crawl.conf. Add additional crawler definitions by adding additional stanzas.
Example crawler stanzas in crawl.conf:
[Example_crawler_name] .... [Another_crawler_name] ....
Add key/value pairs to crawler definition stanzas to set a crawler's behavior. The following keys are available for defining a file_crawler:
| bad_directories_list | Specify directories to exclude. |
| bad_extensions_list | Specify file extensions to exclude. |
| bad_file_matches_list | Specify a string, or a comma-separated list of strings that filenames must contain to be excluded. You can use wildcards (examples: foo*.*,foo*bar, *baz*). |
| packed_extensions_list | Specify extensions of common archive filetypes to include. Splunk unpacks compressed files before it reads them. It can handle tar, gz, bz2, tar.gz, tgz, tbz, tbz2, zip, and z files. Leave this empty if you don't want to add any archive filetypes. |
| collapse_threshold | Specify the minimum number of files a source must have to be considered a directory. |
| days_sizek_pairs_list | Specify a comma-separated list of age (days) and size (kb) pairs to constrain what files are crawled. For example: days_sizek_pairs_list = 7-0, 30-1000 tells Splunk to crawl only files last modified within 7 days and at least 0kb in size, or modified within the last 30 days and at least 1000kb in size. |
| big_dir_filecount | Set the maximum number of files a directory can have in order to be crawled. crawl excludes directories that contain more than the maximum number you specify. |
| index | Specify the name of the index to which you want to add crawled file and directory contents. |
| max_badfiles_per_dir | Specify how far to crawl into a directory for files. If Splunk crawls a directory and doesn't find valid files within the specified max_badfiles_per_dir, then Splunk excludes the directory. |
| root | Specify directories for a crawler to crawl through. |
Here's an example crawler called simple_file_crawler may look like:
[simple_file_crawler] bad_directories_list= bin, sbin, boot, mnt, proc, tmp, temp, home, mail, .thumbnails, cache, old bad_extensions_list= mp3, mpg, jpeg, jpg, m4, mcp, mid bad_file_matches_list= *example*, *makefile, core.* packed_extensions_list= gz, tgz, tar, zip collapse_threshold= 10 days_sizek_pairs_list= 3-0,7-1000, 30-10000 big_dir_filecount= 100 index=main max_badfiles_per_dir=100
Comments
@business34 : please contact support@splunk.com and let them know about the difficulties you're having. thank you!
Posted by rachel on Oct 05 2008, 10:31am
Yeah I just downloaded splunk and it said that it's just not ready for windows just yet.And I even tried to download the toolbar that's even letting me do that either.So that just like reversed itself.
Posted by business34 on Oct 04 2008, 5:08pm