Documentation: 3.3.4
Print Version Contents
This page last updated: 09/30/08 01:09pm

Use crawl

crawl searches your filesystem for new data sources to add to your index. Configure one or more types of crawlers in crawl.conf to define the type of data sources to include in or exclude from your results. Save this crawl search and schedule it to run regularly to update your indexes.

This topic explains how to use the crawl command and how to save and schedule a crawl search. Refer to the Admin manual for instructions to configure crawl. You can also watch this Splunk developer video about crawl.

Note: Splunk currently supports one type of crawler, labeled file_crawler. As yet, you cannot define a custom crawler.

Run a crawl

In Splunk Web, you can access and run the crawl command from the Splunk search bar and the Admin > Data Inputs: Crawls page.

The Splunk search bar
You can run the crawl command directly from the search bar:

| crawlSearch

If you run a crawl without arguments, Splunk searches your filesystem with the settings defined in crawl.conf. To override these default settings, specify crawl options at search time.

For example, you can tell Splunk to crawl specific directories when you include the root argument:

| crawl root=/private/var/log;/private/var/dbSearch

The Admin page
You can manage all your saved crawls from the Admin > Data Inputs: Crawls page. From this page, you can also run the default crawl search by clicking New Crawl:

| crawl | search NOT *personal*Search

After the crawl completes you can add or remove options to narrow your search.

Results of a crawl

For each item listed in your crawl results, Splunk displays whether or not it is a file, a timestamp indicating when it was last modified, its size, and its status (whether it is added or not added to your inputs). You can perform two actions on each data source: Add input and Preview file/directory.

Preview file or directory

To review the contents of the data source before adding it as an input, click Preview file or Preview directory.

A new window opens:

  • If you click Preview file on a file, Splunk returns events from the file.
  • If you click Preview directory on a directory, Splunk displays a list of the files in the directory and lets you drill-down further and preview each file.

Add input

To add the selected data source as an input, click Add input.

Now, when you go to the Admin page and select the Data Inputs tab, your selected data source is listed.

Note: Adding data inputs with crawl modifies your inputs.conf file to include a stanza describing the new source. For example, if crawl discovers /var/log, clicking Add input adds the following stanza to inputs.conf:

[monitor:///var/log]
disabled = false
index = main
class = crawl
generator = ui

Save a crawl

After you run a crawl search, save the search by clicking the Save this Crawl... link located above your search results. This action opens the Admin > Data Inputs: Crawls: Create Crawl page which prompts you to:

  • Name your crawl search.
  • If necessary, edit your search.
  • If desired, elect to run your crawl on a schedule.
  • Click Cancel to return to the Admin > Data Inputs.
  • Click Save to save your crawl search.

Note: Your crawl won't save, if you don't provide a name.

Manage saved crawls

Manage your saved crawl searches from the Admin > Data Inputs: Crawls page. You can run a new crawl or select one or more saved crawls to:

  • Run Now and update your indexes.
  • Enable or Disable so that you can start or stop updating particular indexes.
  • Delete to remove the search from your list.

Edit the search and schedule properties of an individual crawl by clicking on its Name.

Note: You can't change the name of your saved crawl.

Schedule saved crawls

When scheduling your saved crawls, you can define the type of schedule and how frequently to run it. You can also set alert options and define fields to include in summary indexes. These options are exactly the same as options provided for saving regular (non-crawl) searches.

Previous: Use Data Inputs page    |    Next: About indexes and indexing

Comments

No comments have been submitted.

Log in to comment.