crawl searches your filesystem for new data sources to add to your index. Configure one or more types of crawlers in crawl.conf to define the type of data sources to include in or exclude from your results. Save this crawl search and schedule it to run regularly to update your indexes.
This topic explains how to use the crawl command and how to save and schedule a crawl search. Refer to the Admin manual for instructions to configure crawl. You can also watch this Splunk developer video about crawl.
Note: Splunk currently supports one type of crawler, labeled file_crawler. As yet, you cannot define a custom crawler.
Run a crawlIn Splunk Web, you can access and run the crawl command from the Splunk search bar and the Admin > Data Inputs: Crawls page.
The Splunk search bar
You can run the crawl command directly from the search bar:
For example, you can tell Splunk to crawl specific directories when you include the root argument:
The Admin page
You can manage all your saved crawls from the Admin > Data Inputs: Crawls page. From this page, you can also run the default crawl search by clicking New Crawl:
For each item listed in your crawl results, Splunk displays whether or not it is a file, a timestamp indicating when it was last modified, its size, and its status (whether it is added or not added to your inputs). You can perform two actions on each data source: Add input and Preview file/directory.
Preview file or directoryTo review the contents of the data source before adding it as an input, click Preview file or Preview directory.
A new window opens:
To add the selected data source as an input, click Add input.
Now, when you go to the Admin page and select the Data Inputs tab, your selected data source is listed.
Note: Adding data inputs with crawl modifies your inputs.conf file to include a stanza describing the new source. For example, if crawl discovers /var/log, clicking Add input adds the following stanza to inputs.conf:
[monitor:///var/log] disabled = false index = main class = crawl generator = ui
After you run a crawl search, save the search by clicking the Save this Crawl... link located above your search results. This action opens the Admin > Data Inputs: Crawls: Create Crawl page which prompts you to:
Note: Your crawl won't save, if you don't provide a name.
Manage saved crawlsManage your saved crawl searches from the Admin > Data Inputs: Crawls page. You can run a new crawl or select one or more saved crawls to:
Edit the search and schedule properties of an individual crawl by clicking on its Name.
Note: You can't change the name of your saved crawl.
Schedule saved crawlsWhen scheduling your saved crawls, you can define the type of schedule and how frequently to run it. You can also set alert options and define fields to include in summary indexes. These options are exactly the same as options provided for saving regular (non-crawl) searches.
Comments
No comments have been submitted.