
This document last updated: 06/30/08 04:06pm
Splunk Preview is where you can get an early look at new features before the next general availability (GA) release.
Important: Preview is not meant for production. You can't upgrade a prior GA Splunk release to a Splunk Preview, and you won't be able to upgrade Preview to the next GA release. The features of Splunk Preview are still under construction, and some older features may not work.
Not ready for the cutting edge? Download the latest current release instead.
Installing Splunk PreviewFor all platforms other than Windows, follow the installation instructions for the latest GA release. For Windows, use the installation instructions for this Preview release.
Preview license informationSplunk Preview releases include licenses that expire on a preset date. Regular 3.x Enterprise licenses (either trial or production) do not work with Preview releases. If you've overwritten the included license, you can find a copy of it in splunk-free.license in your Preview installation's /etc directory.
Important changes in Splunk PreviewSplunk Preview introduces a new directory structure for configuration files to support application download and installation. If you intend to configure anything via configuration files, read more about the new configuration file directory structure.
About Splunk Preview documentationThe Splunk Preview documentation is also under construction. Here, you will find in-progress documentation covering the new features in Splunk. It's possible that during development, a feature's implementation might change from its original specification, so don't be surprised if you notice that this documentation changes from one visit to the next.
Note: Splunk Preview documentation covers only the new features in this Preview release. For details on pre-existing features and installation instructions, refer to the documentation posted for the latest GA release.
Developers involvedDuring the course of Preview's development, the developers working on new features will be blogging about their work. These blogs supplement and link to the Preview documentation. Check dev.splunk.com for tips, tricks, and additional information.
Tell us what you thinkPreview is all about getting your feedback - if you try it, let us know what worked, what didn't, and what you'd do differently. Contact splunkpreview@splunk.com, or post to the developer forums with your questions and observations about Splunk Preview.
If you have questions about the documentation, you can comment directly on each page as well.
Known Issues in this PreviewThis section contains known issues and workarounds for this Preview release of Splunk.
Caution: Preview is not meant for production. You can't upgrade a prior GA Splunk release to a Splunk Preview, and you won't be able to upgrade Preview to the next GA release. The features of Splunk Preview are still under construction, and some older features may not work.
General issuesSplunk Preview introduces changes to Splunk Web which include new dashboard elements and a new Admin layout.
Dashboard elementsThere are many upcoming changes to the getting started dashboard. With Splunk Preview, you can now watch feature videos from Splunk Web and add files and more data to your indexes (directly from the getting started dashboard).
Watch Splunk feature videosWhen you first log into Splunk Web, the Learn about Splunk window displays over the getting started dashboard. This window provides demo videos of Splunk features for viewing. If you don't want to view this window next time you start Splunk, check the "Do not show this window on startup". When you're ready to begin using Splunk, click "Close this window and start using Splunk".
Note: After closing the Learn about Splunk window, you can open it again from the link under Get help on the getting started dashboard.
Index some dataThe Getting Started dashboard provides new options for indexing data. Different from previous versions, the links for indexing data takes you to the index manager.
Read more about the index manager and adding inputs.
Admin navigationAs before, when you click on the Admin link, to the top right of the page, the Server settings page opens. Instead of navigating a tabbed menu layout, you now access the Admin pages from a list located on the left side of the page. Click on the top-level section names to view the pages included in that section. You have access to the same pages as before (Server, Data Inputs, Distributed, Users, Saved Searches, and License & Usage) with the addition of Indexes and Applications.
Application ManagementWith Splunk Preview, you can use the Admin page to browse, manage, and install applications that are available on SplunkBase.
Read more about the application manager.
Index ManagementIn previous Splunk releases, you used the command line interface (CLI) to manage your indexes. Now, you can view your indexes, edit your index properties, and add new indexes from the Admin page.
Read more about managing your indexes within Splunk Web.
Splunk Preview introduces a new search feature, crawl, that searches your filesystem for new data sources to add to your index. Configure one or more types of crawlers in crawl.conf to define the type of data sources to include in or exclude from your results. Save this crawl search and schedule it to run regularly to update your indexes.
This topic explains how to use the crawl command, save and schedule a crawl search, and configure different crawlers.
Note: Splunk Preview currently supports one type of crawler, labeled file_crawler. As yet, you cannot define a custom crawler.
Use crawlIn Splunk Web, you can access and run the crawl command from the Splunk search bar and the Admin > Data Inputs: Crawls page.
The Splunk search bar
You can run the crawl command directly from the search bar:
The Admin page
You can manage all your saved crawls from the Admin > Data Inputs: Crawls page. From this page, you can also run the default crawl search by clicking New Crawl:
For each item listed in your crawl results, Splunk displays whether or not it is a file, a timestamp indicating when it was last modified, its size, and its status (whether it is added or not added to your inputs). You can perform two actions on each data source: Add input and Preview file/directory.
Preview file or directoryTo review the contents of the data source before adding it as an input, click Preview file or Preview directory.
A new window opens:
To add the selected data source as an input, click Add input.
Now, when you go to the Admin page and select the Data Inputs tab, your selected data source is listed.
Note: Adding data inputs with crawl modifies your inputs.conf file to include a stanza describing the new source. For example, if crawl discovers /var/log, clicking Add input adds the following stanza to inputs.conf:
[tail:///var/log] disabled = false index = main _class = crawl _generator = ui
After you run a crawl search, save the search by clicking the Save this Crawl... link located above your search results. This action opens the Admin > Data Inputs: Crawls: Create Crawl page which prompts you to:
Note: Your crawl won't save, if you don't provide a name.
Manage saved crawlsManage your saved crawl searches from the Admin > Data Inputs: Crawls page. You can run a new crawl or select one or more saved crawls to:
Edit the search and schedule properties of an individual crawl by clicking on its Name.
Note: You cannot change the name of your saved crawl.
Schedule saved crawlsWhen scheduling your saved crawls, you can define the type of schedule and how frequently to run it. You can also set alert options and define fields to include in summary indexes. These options are exactly the same as options provided for saving regular (non-crawl) searches.
Configure crawlConfigure crawl in two ways:
Edit crawl.conf to define and enable one or more crawlers that browse your data sources when you run the crawl command. You define each crawler by specifying values for each of the crawl options. You enable the crawler by adding it to crawlers_list.
Crawl loggingThe crawl command produces a log of crawl activity that's stored in /splunkpreview/var/log/splunk/crawl.log. Set the logging level with the logging key in the [default] stanza.
Example:
Set the logging level of crawl to warn.
[default] logging=warn
Enable a crawler by listing the crawler specification stanza name in the crawlers_list key of the [crawlers] stanza.
Use a comma-separated list to specify multiple crawlers.
Example:
Enable crawlers that are defined in the stanzas: [file_crawler], [port_crawler], and [db_crawler].
[crawlers] crawlers_list= file_crawler, port_crawler, db_crawler
Define a crawler by adding a definition stanza in crawl.conf. You can add additional crawler definitions by adding additional stanzas.
Example:
[Example_crawler_name] .... [Another_crawler_name] ....
Add key/value pairs to crawler definition stanzas to set a crawler's behavior. The following keys are available for defining a file_crawler:
| bad_directories_list= | Specify directories to exclude. |
| bad_extensions_list= | Specify file extensions to exclude. |
| bad_file_matches_list= | Specify a string, or a comma-separated list of strings that filenames must contain to be excluded. You can use wildcards (examples: foo*.*,foo*bar, *baz*). |
| packed_extensions_list= | Specify extensions of compressed files to exclude. |
| collapse_threshold= | Specify the minimum number of files a source must have to be considered a directory. |
| days_sizek_pairs_list= | Specify a comma-separated list of age (days) and size (kb) pairs to constrain what files are crawled. For example: days_sizek_pairs_list = 7-0, 30-1000 tells Splunk to crawl only files last modified within 7 days and at least 0kb in size, or modified within the last 30 days and at least 1000kb in size. |
| big_dir_filecount= | Set the maximum number of files a directory can have in order to be crawled. crawl excludes directories that contain more than the maximum number you specify. |
| index= main | Specify the name of the index to add crawled file and directory contents to. |
| max_badfiles_per_dir= | Specify how far to crawl into a directory for files. If Splunk crawls a directory and doesn't find valid files within the specified max_badfiles_per_dir, then Splunk excludes the directory. |
Example:
A simple file_crawler.
[simple_file_crawler] bad_directories_list= bin, sbin, boot, mnt, proc, tmp, temp, home, mail, .thumbnails, cache, old bad_extensions_list= mp3, mpg, jpeg, jpg, m4, mcp, mid bad_file_matches_list= *example*, *makefile, core.* packed_extensions_list= gz, tgz, tar, zip collapse_threshold= 10 days_sizek_pairs_list= 3-0,7-1000, 30-10000 big_dir_filecount= 100 index=main max_badfiles_per_dir=100
| crawl [crawl option]...
Note:If you have any other command ahead of crawl in a search pipeline, Splunk automatically discards the data-generated ahead of crawl and outputs data generated from crawl. For example: If you have a search command ahead of a crawl command in your search, Splunk automatically discards the search results and outputs data generated from crawl.
ArgumentsNote: The default values for crawl options are found in crawl.conf.spec.
file_crawler crawl options
| crawl option | bad_directories_list | bad_extensions_list | bad_file_matches_list | packed_extensions_list | collapse_threshold | days_sizek_pairs | big_dir_filecount | index | max_badfiles_per_dir | Specify values to override key values in crawl.conf. |
| bad_directories_list | bad_directories_list=string, string, ... | Specify directories to exclude. |
| bad_extensions_list | bad_extensions_list=string,string,... | Specify file extensions to exclude. |
| bad_file_matches_list | bad_file_matches=(string | string* | *string | *string* | *string*string | string*string*), ... | Specify a string, or a comma-separated list of strings that filenames must contain to be excluded. You can use wildcards (examples: foo*.*,foo*bar, *baz*). |
| packed_extensions_list | packed_extensions_list=string, string, ... | Specify extensions of compressed files to exclude. |
| collapse_threshold | collapse_threshold=integer (default=3) | Specify the minimum number of files a source must have to be considered a directory. |
| days_sizek_pairs | days_sizek_pairs=integer(days)-integer(kb), ... (default= 7-0, 30-1000) | Specify a comma-separated list of age (days) and size (kb) pairs to constrain what files are crawled. For example: days_sizek_pairs_list = 7-0, 30-1000 tells Splunk to crawl only files last modified within 7 days and at least 0kb in size, or modified within the last 30 days and at least 1000kb in size. |
| big_dir_filecount | big_dir_filecount=integer (default=10000) | Set the maximum number of files a directory can have in order to be crawled. crawl excludes directories that contain more than the maximum number you specify. |
| index | index=string (default=main) | Specify the name of the index to add crawled file and directory contents to. |
| max_badfiles_per_dir | max_badfiles_per_dir=integer (default=100) | Specify how far to crawl into a directory for files. If Splunk crawls a directory and doesn't find valid files within the specified max_badfiles_per_dir, then Splunk excludes the directory. |
The following command tells Splunk to browse for:
This is the default crawl.conf file that ships with Splunk.
# Copyright (C) 2005-2008 Splunk Inc. All Rights Reserved. Version 3.0 # # Crawl Configuration # # Set of attribute-values used by crawl. # # If attribute, ends in _list, the form is: # # attr = val, val, val, etc. # # The space after the comma is necessary, so that "," can be used, as in BAD_FILE_PATTERNS's use of "*,v" # Whitespace is stripped away and comments, such as this, are on lines that start with "#" # [default] logging = warn [crawlers] crawlers_list = file_crawler [file_crawler] # SEMICOLON SEPARATED LIST OF DIRECTORY LOCATIONS TO START FROM root = /;/Library/Logs # DIRECTORIES TO SKIP ALL TOGETHER. Consider "root" and "home" bad_directories_list = bin, sbin, boot, mnt, proc, tmp, temp, dev, initrd, help, driver, drivers, share, bak, old, lib, include, doc, docs, man, html, images, tests, js, dtd, org, com, net, class, java, resource, locale, static, testing, src, sys, icons, css, dist, cache, users, system, resources, examples, gdm, manual, spool, lock, kerberos, .thumbnails, libs, old, manuals, splunk, mail, resources, documentation, applications, library, network, automount, mount, cores, lost\+found, fonts, extensions, components, printers, caches, findlogs, music, volumes, libexec, # EXTENSIONS TO SKIP bad_extensions_list = 0t, a, adb, ads, ali, am, asa, asm, asp, au, bak, bas, bat, bmp, c, cache, cc, cg, cgi, class, clp, com, conf, config, cpp, cs, css, csv, cxx, dat, doc, dot, dvi, dylib, ec, elc, eps, exe, f, f77, f90, for, ftn, gif, h, hh, hlp, hpp, hqx, hs, htm, html, hxx, icns, ico, ics, in, inc, jar, java, jin, jpeg, jpg, js, jsp, kml, la, lai, lhs, lib, license, lo, m, m4, mcp, mid, mp3, mpg, msf, nib, nsmap, o, obj, odt, ogg, old, ook, opt, os, os2, pal, pbm, pdf, pdf, pem, pgm, php, php3, php4, pl, plex, plist, plo, plx, pm, png, po, pod, ppd, ppm, ppt, prc, presets, ps, psd, psym, py, pyc, pyd, pyw, rast, rb, rc, rde, rdf, rdr, res, rgb, ro, rsrc, s, sgml, sh, shtml, so, soap, sql, ss, stg, strings, tcl, tdt, template, tif, tiff, tk, uue, v, vhd, wsdl, xbm, xlb, xls, xlw, xml, xsd, xsl, xslt, jame, d, ac, properties, pid, del, lock, md5, rpm, pp, deb, iso, vim, lng, list # IMPLIED "$" (END OF FILENAME) AFTER EACH PATTERN HERE bad_file_matches_list = *~, *#, *,v, *readme*, *install, (/|^).*, *passwd*, *example*, *makefile, core.* packed_extensions_list = bz, bz2, tbz, tbz2, Z, gz, tgz, tar, zip # ADD A DIRECTORY, RATHER THAN INDIVIDUAL FILES, IF IT HAS 1000 OR MORE FILES collapse_threshold = 1000 # PAIRS OF MAXIMUM AGE AND MINIMUM SIZE. # default is to accept text/archived files modified in he last 7 days # with 0k, or modified in the last 30 days if it has at least 1000k days_sizek_pairs_list = 7-0, 30-1000 # SKIP DIRECTORIES WITH TOO MANY FILES big_dir_filecount = 10000 # DEFAULT INDEX TO ADD FILES index = main # SKIP DIRECTORIES AFTER INVESTIGATING N FILES WITHOUT FINDING SOMETHING WORTHWHILE max_badfiles_per_dir = 100
Summary indexing uses the addinfo command to add fields containing general information about the current search to events going into a summary index. You can also use | addinfo in any search to add general information (about the current search) to the search results. This is useful if you want to build and test searches and reports on search results before using summary indexing.
Currently, addinfo adds the following fields to each result:
Note: The fields that addinfo adds are defined in savedsearches.conf. Currently, you can't customize the fields addinfo adds to the search results.
Syntaxaddinfo
ArgumentsNone.
ExamplesSplunk Web:
This example searches Web server data and builds a report based on client IPs. It then adds fields containing general search information to the search results, returns a list sorted by unique IP addresses and by what search each event came from (query_ID).
host=webserver1 eventtype=banner_access NOT eventtypetag=bot NOT eventtypetag=images NOT eventtype=splunk_IPs NOT eventtype=10dot_IP_range NOT eventtypetag=invalid_page | stats distinct_count(clientip) as uniqueIPs, max(_time), min(_time) | eval site="update_banners" | addinfo | sort uniqueIP, info_search_idThis example searches Web server data for raw downloads and adds global data to the search results.
Summary indexing uses the collect command to place the results of a saved search into a summary index so you can search them later. You can also use | collect in any search to place search results in any index. For example, if you create a reports from a search, use | collect to index them so that you can search across all of the reports uniformly, or create a larger aggregate report from multiple reports.
Before collect indexes search results, it saves them as events in a file ($SPLUNK_HOME/var/spool/splunk/events_random-number.stash by default). You can override the default file name and location using the file and path options. Use other collect options to override other default settings.
Syntaxcollect collect-index [collect option],...
Arguments| collect-index | index=string | Specify the name of the index to add search results to. Note: The specified index must already exist. Configure indexes in indexes.conf. |
collect option
| collect option = | addtime | file | path | marker | testmode | Specify options to override default settings of collect. |
| addtime | addtime= (T | F) (default=T) | Set to true (T) to tell Splunk to prepend a timestamp to events that have no extractable timestamp in their _raw field. |
| file | file=string (default=events_random-number.stash) | Specify the file to write events to. |
| marker | marker=string (default=" ") | Specify a string of field/value pairs (comma-delimited list) to append to each event that's indexed. |
| path | path=string (default=$SPLUNK_HOME/var/spool/splunk/) | Specify the path to store the file that events are written to. Note: Splunk must have this path set as a data input for events in the file to be indexed. |
| testmode | testmode=(T | F) (default=F) | Set to true (T) to put collect in test mode. In test mode, search results aren't written into the new index, but they are still rendered in Splunk Web as they'd appear if they were indexed. |
Splunk Web:
This example searches Web server data and builds a report based on client IPs. The report is then indexed into the index WebReports.
host=webserver1 eventtype=banner_access NOT eventtypetag=bot NOT eventtypetag=images NOT eventtype=splunk_IPs NOT eventtype=10dot_IP_range NOT eventtypetag=invalid_page | stats distinct_count(clientip) as uniqueIPs, max(_time), min(_time) | eval site="update_banners" | collect index=WebReportsThis example searches Web server data for raw downloads and indexes the results in the index downloadcount.
"eventtypetag=download" NOT eventtypetag=bot NOT eventtypetag=internal | collect index=downloadcountUse the eventstats command to generate summary statistics of specified fields and save them into new fields. Specify a new field name for the statistics results with the as argument. If you don't specify a new field name, the default field name is the statistical operator and the field it operated on (for example: stat-operator(field)). You can group summary statistics by field (or more than one field) with the by argument. Each distinct set of by fields count as a distinct grouping.
Syntaxeventstats stat-operator [as new-field-name] ]... [by groupby-field(s)]
Arguments| groupby-fields | field,field,... | Specifies the fields by which to group events. One result is returned per distinct combination of values of the fields. |
| new-field-name | name of new field | Specifies a new field name for the appended statistical result field. |
stat-operator
| stat-operator | count | distinct_count | first | last | sum | min | max | avg | mean | mode | median | stdev | var | percXX | Specifies the statistical operation to perform. |
| count | c | count|c(field) | Find the count of values in the specified field(s). |
| distinct_count | dc | distinct_count|dc(field) | Find the count of distinct values in the specified field(s). |
| first | first | Show the first "seen" value of a field. |
| last | last | Show the last "seen" value of a field. |
| sum | sum[(field)] | Produce the sum of the values of the field. |
| min | min(field) | FInd the minimum value of values in the specified field(s). |
| max | max(field) | Find the maximum value of values in the specified field(s). |
| avg | avg(field) | Find the average value of values in the specified field(s). |
| mean | mean(field) | Find the mean value of values in the specified field(s). |
| mode | mode(field) | Find the mode value of values in the specified field(s). |
| median | median(field) | Find the median value of values in the specified field(s). |
| stdev | stdev(field) | Find the standard deviation of values in the specified field(s). |
| var | var(field) | Find the variance of values in the specified field(s). |
| percXX | percXX | Percentile, integer between 1 and 99 |
Splunk Web:
This example searches the data in the sampledata index, and creates a field of the average value of bytes for each event with different values of date_minute.
Use makemv to change any field into a multi-value field during search time. Configure multi-value fields if a field's value string contains more than one useful value and you want to use them separately. For example, use multi-value fields to separate out multiple email addresses from a field so that you can get the distinct count of the number of people to whom an email was sent.
Specify a delim argument to parse a field value using a simple string delimiter. Specify a tokenizer argument to parse a field value like a regular expression. Add the allowempty argument to parse consecutive delim or tokenizer arguments separately.
Note:You can also configure multi-value fields at index time by editing fields.conf (Learn how to configure multi-value fields via fields.conf).
Syntaxmakemv [tokenizer | delim] [allowempty] field
Arguments| tokenizer | tokenizer="string" | Use tokenizer to parse a field value as a regular expression. This is exactly like using the TOKENIZER=<regular expression> key when configuring multi-value fields in fields.conf. |
| delim | delim="string" | Use delim to parse a field value using a simple string delimiter (can be multiple characters). |
| allowempty | allowempty=(T | F)(default= F) | Set allowempty to T to accept empty values when parsing an entire field value. Empty values occur when makemv parses two consecutive delim arguments. |
| field | field name (string) | Specify a field to change into a multi-value field. |
Note: If you don't specify a tokenizer or delim argument, makemv uses a single space as a delimiter (delim=" ") by default.
ExamplesSplunk Web:
This example searches for sendmail events and uses makemv to parse the individual senders delimited by a comma (,). Splunk then reports the top senders.
Use mvcombine to combine otherwise identical events in your search results that have a single differing field value into one result with a multi-value field of the differing field. Each result's differing field value then becomes a value in the multi-value field. Use mvcombine if your data has identical events coming from different sources, hosts, or client IP addresses. Use the field argument to specify the field to make into a multi-value field.
For example, if you have two search results:
Add | mvcombine field3 to your search. Splunk combines the two results into:
mvcombine [delim] field
Arguments| field | field name(string) | Specify a field to change into a multi-value field. |
| delim | delim="string" | Specify a delimiter to use in the new multi-value field. |
Splunk Web:
This example combines identical search results with differing values in the field foo, and returns a single search result for all identical events with differing values of foo. Splunk lists each result's value of foo in the single result's multi-value field foo separated by colons (:).
Use mvexpand to expand the values of a multi-value field into separate events for each value of the multi-value field. Specify a field to expand using the field argument. mvexpand copies the original event for each value of field. For example:
If you have:
and you add | mvexpand field2 to your search, you get:
Note: If the field you specify isn't multi-value or there isn't a value of field for an event, then nothing happens to the event.
Syntaxmvexpand field
Arguments| field | field name (string) | Specify a multi-value field to expand. |
Splunk Web:
This example expands the values of the field foo into separate events for each value of foo.
Use nomv to change a multi-value field into a single-value field at search time. This is useful if you want to override multi-value field configurations in fields.conf. nomv causes multi-value field values to be considered as one single-value string (ignoring delimiters and tokenizers set in fields.conf).
Note: Learn how to configure multi-value fields via fields.conf.
Syntaxnomv field
Arguments| field | field name (string) | Specify a multi-value field to change to a single-value field. |
Splunk Web:
This example searches sendmail events and returns the top lists of senders (a complete matching list of email addresses). If nomv isn't added to this search, then this example returns the top individual senders based on the multi-value field configuration in fields.conf.
Use | overlap in a search to find events in a summary index that overlap in time, or find gaps in time that a scheduled saved search may have missed events. Overlaps can occur when you schedule a saved search to run with a time range that's shorter than the time range set in the search. Gaps can occur when you schedule a saved search to run with a longer time range than the time range set in the search.
For example, if you schedule the following search to run every minute, Splunk generates overlaps. If you schedule the same search to run every 5 minutes, Splunk returns gaps.
Note: Learn how to remove overlaps and gaps by referring to the best practices for summary indexing page.
Syntaxoverlap
ArgumentsNone.
ExamplesSplunk Web:
This example finds and returns overlapping events in the entire summary index.
Use rawstats to help you filter and classify events. rawstats adds fields to your events that contain information about their _raw field (fields beginning with rawstat_ are rawstats fields). You can add rawstat_ fields to the fields menu by using the fields picker, or by using the fields command (add | fields * to show all fields, or |fields rawstat_<fieldname> to show a specific rawstat_ field). Once you add rawstat_ fields to the field menu, you can filter your search or report on them just like you can with any other field.
Note: rawstats adds fields that contain the following information about an event's _raw field: blank line count; number of lines starting with characters; number of lines starting with punctuation; counts of alpha-numeric, numeric, lowercase, uppercase, spaces, and other characters; line width and left-margin statistics (average, minimum, maximum, median, standard deviation).
Syntaxrawstats
ArgumentsNone.
ExamplesSplunk Web:
This example searches for all events, adds rawstats information, and adds all fields to the fields menu.
This example searches for events that are long (have many lines), and narrow.
Summary indexing provides support for greater efficiency when running reports on large datasets over large time spans. Summary indexing saves the results of a scheduled search into a special summary index that you designate. You can then search and run reports on this smaller, specially generated summary index instead of working with the much larger original data set.
You can use summary indexing to:
For example, you may want to run a report at the end of every month that tells you how many page views and visitors each of your Web sites had, broken out by site. If you just run this report at the end of the month, it could take a very long time to run because Splunk has to look through a great deal of data to extract the information you want. However, if you use summary indexing, you schedule a saved search that runs periodically over smaller slices of time and Splunk saves the results (since the last time the report was run) into a special (summary) index. You can then run an "end of the month" report on the data indexed in this much smaller index.
Or, you may want to run a report that shows a running count of a statistic over a long period of time. For example, you may want a running count of downloads of a file from a Web site you manage. Schedule a saved search to return the total number of downloads over a specified slice of time. Use summary indexing to have Splunk save the results into a summary index. You can then run a report any time you want on the data in the summary index to obtain the latest count of the total number of downloads.
How summary indexing worksSummary indexing is an alert option for scheduled saved searches. When you run a saved search with summary indexing turned on, its search results are temporarily stored in a file ($SPLUNK_HOME/var/spool/splunk/<savedsearch_name>_<random-number>.stash). From the file, Splunk adds general information about the current search and the fields you specify during configuration (using the addinfo command) to every result and indexes the results as events in a summary index (index=summary by default).
Note: Use the addinfo command to add fields containing general information about the current search to the search results going into a summary index. General information added about the search helps you run reports on results you place in a summary index.
After results are indexed in the summary index, you can search and report on them by specifying the name of the summary index in your search.
Example:
This example searches for all events in the summary index and returns events from the most common referers.
Summary indexing uses some new search commands behind the scenes to perform its actions.
Another useful command is overlap. You can use overlap to find gaps in events or overlapping events in a summary index.
Configure summary indexing in Splunk Web before you customize it in savedsearches.conf. Learn how to configure summary indexing.
Configure summary indexingThis page contains information to help you set up summary indexing for any saved search via Splunk Web, and customize it further by editing savedsearches.conf. For an introduction to summary indexing refer to Summary indexing.
Set up summary indexing via Splunk WebSaving results to a summary index is an alert action. Configure summary indexing as an alert action for any scheduled saved search.
1. Create a scheduled search in the Saved searches heading of the Admin page.
2. Select Run this search on a schedule to configure alert properties.
3. Set the Enable summary indexing alert property
4. Optionally, add a field/value pair to add to search results obtained by the scheduled saved search.
Note: Currently, you can only add one field/value pair when configuring summary indexing in Splunk Web. You can add as many as you like if you add them by editing savedsearches.conf.
Configure summary indexing via savedsearches.confThe information in this section explains how to further configure summary indexing once you have set it up in Splunk Web.
Note: You must set up summary indexing for a saved search in Splunk Web before you configure additional settings in savedsearches.conf.
When you enable summary indexing for a saved search in Splunk Web, Splunk automatically generates a stanza in savedsearches.conf. Customize summary indexing by editing the generated stanza. Splunk names the stanza based on the name of the saved search for which you enabled summary indexing, like this: [summary_savedsearchname].
Summary indexing keys:
| action.summary_index = | Set to 1 to enable summary indexing. Set to 0 to disable summary indexing. |
| action.summary_index.fieldname = value | Specify a field/value pair to add to every search result indexed in the summary index. Specify any numeric or string value for value. Add additional action.summary_index.fieldname = value | "value" | "long string" entries to append as many field/value pairs to events going into the summary index as you like. |
Example:
This example shows a configuration for a summary index of Web statistics. The keys listed below enable summary indexing for the saved search "MonthlyWebstatsReport", and append the field Webstatsreport witht a value of 2008 to every event going into the summary index.
# name of the summary index= MonthlyWebstatsReport [summary_MonthlyWebstatsReport] # enable summary indexing action.summary_index = 1 # add these keys to each event action.summary_index.Webstatsreport=2008
In addition to the settings you configure in savedsearches.conf, summary indexing requires that settings exist in indexes.conf, and alert_actions.conf. Splunk ships with the necessary default settings:
Caution: Do not edit settings in alert_actions.conf without explicit instructions from Splunk staff.
Best practices for summary indexingThis topic contains guidelines and best practices for configuring and using summary indexing.
General guidelines for summary indexingNote: Currently, indexing events in a summary index counts against your license volume. We recommend that you not index more events in your summary indexes than you really need. Consult Splunk support for specific information on license volume impact.
Use summary indexing to:
When using summary indexing:
Be careful when building reports made of aggregated statistics. Some aggregating statistical functions (such as distinct count, mode, median, etc.) yield incorrect results when you use them on aggregated statistics. Use one of Splunk's reporting commands to access statistical functions.
For example, if you want to build hourly/daily/weekly reports of average response times, generate the "daily average" by averaging the "hourly averages" together. The daily average becomes skewed if there aren't the same number of events in each "hourly average". Get the correct "daily average" by using a weighted average function.
Example:
The following expression calculates the the daily average response time correctly (a weighted average) using stats and eval.
| stats sum(hourly_resp_time_sum) as resp_time_sum, sum(hourly_resp_time_count) as resp_time_count | eval daily_average= resp_time_sum/resp_time_count | .....Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can occur if:
Overlaps are events in a summary index (from the same search) that share the same timestamp. Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if you set the time range of a saved search to be longer than the frequency of the schedule of the search, or you run summary indexing manually (using | collect).
Identify gaps and overlaps in dataIdentify overlaps and gaps in a summary index using the "Summary Index Gaps and Overlaps" form search (a default saved search in the main Splunk dashboard), or by using the overlap command in your search (add | overlap at the end of the search that produces overlaps).
If you run the form search Summary Index Gaps and Overlaps, specify the time range using the form, or switch to a "text" display where you must specify the following parameters in the search bar (following | overlap):
either specify:
or:
If you identify a gap, you can run your scheduled saved search over the period of the gap and summary index the results (using | collect). If you identify overlapping events, you can manually delete the overlaps from the summary index by using the search language.
Splunk Preview introduces application management and browsing within Splunk Web. From the Admin panel, authorized users can:
Refer to the Admin manual for more information on applications.
View and manage applicationsThe Applications: View/Manage page displays a table of the applications currently installed on your system, each application's status, and actions you can select to perform on each application. By clicking on the option name in the Actions column, you can Uninstall, Configure, and Enable/Disable your applications.
UninstallUninstall deletes the application from the list and removes all the files associated with the application. When you click Uninstall a dialog window opens and asks if you wish to continue or cancel the uninstall action.
ConfigureConfigure redirects you to a page where you can edit the application's configuration stanzas (if they exist).
Enable/DisableEnable/Disable turns your application on or off and updates the Status column to reflect the change. If your application is already enabled, you will see the Disable option and vice-versa.
You can use the Enable and Disable buttons, located above the table, to quickly enable and disable all checked items.
Browse SplunkBaseThe Applications: Browse SplunkBase page lets you use Splunk Web to view and install any of the applications available on SplunkBase.
Application summaryYou can scroll through all the applications or use the category links to view groups of applications. Each application has a summary describing its use and a list of information that includes the number of downloads and the application's price.
Install applicationsEach application has an install button: Install Free for free applications or Install 30-day Free Trial for applications that have an associated price. When you click on an install button for an application, Splunk Web redirects you to a SplunkBase login page. You have to login to download the application.
After you download and install an application, you can view and manage it from the Applications: View/Manage page.
IndexesIn previous Splunk releases, you used the command line interface (CLI) to manage your indexes. Now, you can view your indexes, edit their properties, and add new indexes from the Admin page of Splunk Web.
Note: To apply any changes that you make to the indexes, such as editing properties or adding a new index, you must restart Splunk. In Splunk Web, you can restart the Splunk server from Admin > Server: Control Server. Just click Restart Now.
View and manage indexesThe Admin > Index: View/Manage Indexes page displays a table of all your indexes and their properties, including:
Clicking on an index name opens a page that lets you view and edit that index's properties. Properties that you cannot change are grayed out and include:
Properties that you can redefine include:
After you make your changes, click Update. Then, restart Splunk to apply your changes.
Create new indexThe Admin > Indexes: Create Index page lets you define the properties for a new index. To create a new index, enter:
If you check Advanced settings, the list of properties expands. Advanced properties include:
After editing the index's properties, click Add. Then, restart Splunk to save and apply your changes.
Configuration file architecture changesThe current configuration file directory structure has been deprecated in favor of a new application management architecture.
Previously, configuration files were kept in $SPLUNK_HOME/etc/bundles/. Now, there are two general configuration file directories in $SPLUNK_HOME/etc/:
Both system/ and apps/ have the same directory structure:
For example:
apps/
myapp1/
default/
local/
static/
bin/
myapp2/
default/
local/
static/
bin/You can now configure Splunk to automatically extract fields from data sources that are formatted with headers (for example: CSV, TM3, or MS exchange log files). Use automatic header-based field extraction instead of configuring the fields you want to extract by hand. You can access fields that Splunk automatically extracts using the Fields picker in Splunk Web. You can use them for filtering and reporting just like any other extracted field.
If you have a source that is an MS Exchange file and want to extract fields from it using its header information:
# Message Tracking Log File # Exchange System Attendant Version 6.5.7638.1 # Fields: time client-ip cs-method sc-status 14:13:11 10.1.1.9 HELO 250 14:13:13 10.1.1.9 MAIL 250 14:13:19 10.1.1.9 RCPT 250 14:13:29 10.1.1.9 DATA 250 14:13:31 10.1.1.9 QUIT 240
If you enable automatic header-based field extraction, Splunk extracts the fields: time, client-ip, cs-method, and sc-status using the fields and delimiters defined in the source header (# Fields: time client-ip cs-method sc-status).
For example, Splunk extracts the following fields from the first event: (14:13:11 10.1.1.9 HELO 250).
For each source or source type you configure with automatic header-based field extraction, Splunk scans matching sources for header information to use to extract the fields (predefined fields and delimiters). If a source has the necessary information, Splunk extracts fields using delimiter-based key/value extraction (link). Splunk does this by creating an entry in transforms.conf for the source, and populating it with transforms to extract the fields. Splunk also adds a source type stanza to props.conf to tie the field extraction transforms to the source. Splunk then applies the transforms to events from the source at search time.
Note: Automatic header-based field extraction doesn't impact index size or indexing performance because it occurs during source typing (before index time).
Configure automatic header-based field extractionConfigure automatic header-based field extraction for any source or source type by editing props.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see how configuration files work.
Add CHECK_FOR_HEADER=TRUE under any source or source type stanza to turn on automatic header-based field extraction for that source type.
Example props.conf entry using the MS Exchange file from the introduction:
[MSExchange] CHECK_FOR_HEADER=TRUE ...
Note: Set CHECK_FOR_HEADER=FALSE to turn off automatic header-based field extraction.
Changes Splunk makes to configuration filesSplunk adds configuration information to copies of transforms.conf and props.conf in $SPLUNK_HOME/etc/apps/learned/ during automatic header-based field extraction.
Note: Editing configuration information that Splunk adds causes extracted fields to not function properly.Splunk creates a stanza in transforms.conf for each source type with unique header information that matches a source type defined in props.conf. Splunk names each stanza it creates as [AutoHeader-M], where M in an integer that increments sequentially for each source that has a unique header ([AutoHeader-1], [AutoHeader-2],...,[AutoHeader-M]). Splunk populates each stanza with transforms to extract the fields (using header information).
Example transforms.conf entry using the MS Exchange file from the introduction:
... [AutoHeader-1] FIELDS="time", "client-ip", "cs-method", "sc-status" DELIMS=" " ...
Splunk then adds new source type stanzas to props.conf for each unique source. Splunk names the stanzas as [yoursource-N], where yoursource is the source type configured with automatic header-based field extraction, and N is an integer that increments sequentially for each transform in transforms.conf.
Example props.conf entry using the MS Exchange file from the introduction:
# the original source you configure [MSExchange] CHECK_FOR_HEADER=TRUE ... # source type that Splunk adds to tie to transforms for automatic header-based field extraction [MSExchange-1] REPORT-AutoHeader = AutoHeader-1 ...
To return all events that Splunk types with a source type it generated while running automatic header-based field extraction, use a wildcard to search for all events of that source type.
A search for sourcetype="yoursource" looks like this:
These examples show how header-based field extraction works with common source types.
MS Exchange source fileThis example shows how Splunk extracts fields from an MS Exchange file using automatic header-based field extraction.
This sample MS Exchange log file has a header containing a list of field names, delimited by spaces:
# Message Tracking Log File # Exchange System Attendant Version 6.5.7638.1 # Fields: time client-ip cs-method sc-status 14:13:11 10.1.1.9 HELO 250 14:13:13 10.1.1.9 MAIL 250 14:13:19 10.1.1.9 RCPT 250 14:13:29 10.1.1.9 DATA 250 14:13:31 10.1.1.9 QUIT 240
Splunk creates a header and transform in tranforms.conf:
[AutoHeader-1] FIELDS="time", "client-ip", "cs-method", "sc-status" DELIMS=" "
Splunk then ties the transform to the source by adding this to the source type stanza in props.conf:
# Original source type stanza you create [MSExchange] CHECK_FOR_HEADER=TRUE ... # source type stanza that Splunk creates [MSExchange-1] REPORT-AutoHeader = AutoHeader-1 ...
Splunk automatically extracts the following fields from each event:
14:13:11 10.1.1.9 HELO 250
14:13:13 10.1.1.9 MAIL 250
14:13:19 10.1.1.9 RCPT 250
14:13:29 10.1.1.9 DATA 250
14:13:31 10.1.1.9 QUIT 240
This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.
Example CSV file contents:
foo,bar,anotherfoo,anotherbar 100,21,this is a long file,nomore 200,22,wow,o rly? 300,12,ya rly!,no wai!
Splunk creates a header and transform in tranforms.conf (located in: $SPLUNK_HOME/etc/apps/learned/transforms.conf):
# Some previous automatic header-based field extraction [AutoHeader-1] ... # source type stanza that Splunk creates [AutoHeader-2] FIELDS="foo", "bar", "anotherfoo", "anotherbar" DELIMS=","
Splunk then ties the transform to the source by adding this to a new source type stanza in props.conf:
... [CSV-1] REPORT-AutoHeader = AutoHeader-2 ...
Splunk extracts the following fields from each event:
100,21,this is a long file,nomore
200,22,wow,o rly?
300,12,ya rly!,no wai!
The installation steps for Splunk for Windows have changed to accomodate the new WMI and registry monitoring functionality. This is the updated procedure.
Note: When you run the Splunk Windows installer, you are given the option to select a user Splunk will run as. If you install Splunk as the LOCAL SYSTEM user, WMI remote authentication will not work; this user has null credentials and Windows servers normally disallow such connections.
The Windows installer is an MSI file.
1. To start the installer, double-click the splunk.msi file.
The Welcome panel is displayed.
2. To begin the installation, click Next.
Note: On each panel, you can click Next to continue, Back to go back a step, or Cancel to close the installer.
The licensing panel is displayed.
3. Read the licensing agreement and select "I accept the terms in the license agreement". Click Next to continue installing.
The Customer Information panel is displayed.
4. Enter the requested details and click Next.
The Destination Folder panel is displayed.
Note: Splunk is installed by default into the \Program Files\Splunk.
5. Click Change... to specify a different location to install Splunk, or click Next to accept the default value.
The Logon Information panel is displayed.
Splunk installs and runs two Windows services, splunkd and splunkweb. These services will be installed and run as the user you specify on this panel. You can choose to run Splunk as the local system user, or as a user with additional credentials.
The user Splunk runs as must have permissions to:
Note: If you install as the local system user, some network resources may not be available to the Splunk application. Additionally, WMI remote authentication will not work; this user has null credentials and Windows servers normally disallow such connections. Contact your systems administrator for advice if you are unsure what user to specify.
6. Select a user type and click Next.
If you specified the local system user, proceed to step 8. Otherwise, the Logon Information: specify a username and password panel is displayed.
7. Specify a username and password to install and run Splunk and click Next.
The Configure Splunk Data Sources panel is displayed.
8. Check or uncheck boxes to tell Splunk what data you want monitored and indexed:
Important: If you choose to enable baseline snapshots of your local registry hives, the next time you start Splunk, it may take a long time to start up and use a lot of system resources while processing the snapshot. This depends on how large your registry is, and how much of it you plan to monitor. For more information about baseline snapshots and monitoring the Windows registry, refer to Get a baseline snapshot.
The pre-installation summary panel is displayed.
9. Click Install to proceed.
The installer runs and displays the Installation Complete panel.
10. Check the boxes to run Splunk and Splunk Web now. Click FInish.
Start SplunkOn Windows, Splunk is installed by default into \Program Files\Splunk
You can start and stop the following Splunk processes via the Windows Services Manager:
You can also start, stop, and restart both processes at once by going to \Program Files\Splunk\bin and typing
# splunk.exe [start|stop|restart]
Note: If you chose not to index one or more of the Windows event logs by unchecking the box(es) at the end of the installation process, and want to begin indexing later, edit $SPLUNK_HOME/etc/bundles/local/inputs.conf as described in Configure inputs via inputs.conf.
Important: You must use two backslashes \\ to escape wildcards in stanza names in inputs.conf.
Install or upgrade licenseIf you are performing a new installation of Splunk or switching from one license type to another, you must update your license.
Uninstall SplunkTo uninstall Splunk, use the Add or Remove Programs option in the Control Panel.
Windows registry inputThis version of Splunk Preview supports the capture of Windows registry settings and lets you monitor changes to the registry. You can know when registry entries are added, updated, and deleted.
When a registry entry is changed, Splunk captures the name of the process that made the change and the key path from the hive to the entry being changed.
The Windows registry input monitor application runs as a process called splunk-regmon.exe.
Warning: Do not stop or kill the splunk-regmon.exe process manually; this could result in system instability. To stop the process, stop the Splunk server process from the Services MMC snap-in or from within Splunk Web.
How it worksBecause it's possible for Windows registries to be extremely dynamic (thereby generating a great many events), Splunk provides a two-tiered configuration for fine-tuning the filters that are applied to the registry event data coming into Splunk.
Splunk Windows registry monitoring uses two configuration files to determine what to monitor on your system, sysmon.conf and regmon-filters.conf, both located in $SPLUNK_HOME/etc/system/local/. These configuration files work as a hierarchy:
sysmon.conf contains only one stanza, where you specify:
Each stanza in regmon-filters.conf represents a particular filter whose definition includes:
When you install Splunk, you're given the option of recording a baseline snapshot of your registry hives the next time Splunk starts. By default, the snapshot covers the entirety of the user keys and machine keys hives. It also establishes a timeline for when to retake the snapshot; by default, if Splunk has been down for more than 24 hours since the last checkpoint, it will retake the baseline snapshot. You can customize this value for each of the filters in regmon-filters.conf by setting the value of baseline interval.
Note: Executing a splunk clean all -f in the Splunk home directory deletes the current baseline snapshot.
What to considerWhen you install Splunk on a Windows machine and enable registry monitoring, you specify which major hive paths to monitor: key users (HKEY) and/or key local machine (HKLM). Depending on how dynamic you expect the registry to be on this machine, checking both could result in a great deal of data for Splunk to monitor. If you're expecting a lot of registry events, you may want to specify some filters in regmon-filters.conf to narrow the scope of your monitoring immediately after you install Splunk and enable registry event monitoring but before you start Splunk up.
Similarly, you have the option of capturing a baseline snapshot of the current state of your Windows registry when you first start Splunk, and again every time a specified amount of time has passed. The baselining process can be somewhat processor-intensive, and may take several minutes. You can postpone taking a baseline snapshot until you've edited regmon-filters.conf and narrowed the scope of the registry entries to those you specifically want Splunk to monitor.
Note: During the time that Splunk is capturing or updating the baseline, it is unable to receive and index the events taking place in the registry. Consider this when you are setting the interval between snapshots on very active systems, or when forensic registry data is important.
Configure Windows registry inputLook in $SPLUNK_HOME/etc/system/default/inputs.conf to see the default values for Windows registry input. They are also shown below.
If you want to make changes to the default values, edit a copy of inputs.conf in $SPLUNK_HOME/etc/system/local/. You only have to provide values for the parameters you want to change within the stanza. For more information about how to work with Splunk configuration files, refer to How do configuration files work?
[script://$SPLUNK_HOME\bin\scripts\splunk-regmon.py] interval = 60 sourcetype = registry source = registry disabled = 0
Our developers have been blogging about their work on this Preview. Check out Ledio's blog post about Windows registry monitoring for more details.
WMI inputThis Preview release of Splunk supports WMI (Windows Management Interface) data input for agentless access to Windows performance data and event logs. This means you can pull event logs from all the Windows servers and desktops in your environment without having to install anything on those machines.
The Splunk WMI data input can connect to multiple WMI providers and pull data from them. The WMI data input runs as a separate process (splunk-wmi.exe) on the Splunk server. It is configured as a scripted input in etc/system/default/inputs.conf.
Note: This feature is enabled by default.
Security and remote access considerationsSplunk requires privileged access to index many Windows data sources, including WMI, Event Log, and the registry. This includes both the ability to connect to the box, as well as permissions to read the appropriate data once connected.
* There are several things to consider:
To access WMI data, Splunk must run as a user with permissions to perform remote WMI connections. This user name must be a member of an Active Directory domain and must have appropriate privileges to query WMI. Both the Splunk server making the query and the target systems being queried must be part of this Active Directory domain.
Note: If you installed Splunk as the LOCAL SYSTEM user, WMI remote authentication will not work; this user has null credentials and Windows servers normally disallow such connections.
The following steps explain how to test the configuration of the Splunk server and the :
1. Log into the machine Splunk runs on as the user Splunk runs as.
2. Click Start -> Run and type wbemtest. The wbemtest application starts.
3. Click Connect and type \\<server>\root\cimv2, replacing <server> with the name of the remote server. Click Connect. If you are unable to connect, there is a problem with the authentication between the machines.
4. If you are able to connect, click Query and type select * from win32_service. Click Apply. After a short wait, you should see a list of running services. If this does not work, then the authentication works, but the user Splunk is running as does not have enough privileges to run that operation.
Look in $SPLUNK_HOME/etc/system/default/wmi.conf to see the default values for the WMI input. If you want to make changes to the default values, edit a copy of wmi.conf in $SPLUNK_HOME/etc/system/local/. You only have to provide values for the parameters you want to change for a given type of data input.
Refer to How configuration files work for more information about how Splunk uses configuration files, but be sure to use the new directory structure for the correct directory paths.
[settings] initial_backoff = 5 max_backoff = 20 max_retries_at_max_backoff = 2 result_queue_size = 1000 checkpoint_sync_interval = 2 heartbeat_interval = 500 [WMI:AppAndSys] server = foo, bar interval = 10 event_log_file = Application, System, Directory Service disabled = 0 [WMI:LocalSplunkWmiProcess] interval = 5 wql = select * from Win32_PerfFormattedData_PerfProc_Process where Name = "splunk-wmi" disabled = 0
The [settings] stanza specifies runtime parameters. The entire stanza and every parameter within it are optional. If the stanza is missing, Splunk assumes system defaults.
You can specify two types of data input: event log, and raw WQL (WMI query language) The event log input stanza contains the event_log_file parameter, and the WQL input stanza contains wql.
The common parameters for both types are:
WQL-specific parameters:
Event log-specific parameter:
event_log_file: specify a comma-separated list of log files to poll in the event_log_file parameter. File names that include spaces are supported, as shown in the example.
All events are indexed in Splunk with a source of wmi.
The host is identified automatically from the data received.
Hear from the developers!Our developers have been blogging about their work on this preview. Check out Igor's post about WMI for more details.