Topics

Splunk > The IT Search Company

  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk

Localized Splunk documentation

Looking for Splunk documentation in other languages?

Best practices for summary indexing

This documentation does not apply to the most recent version of Splunk.

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2

Best practices for summary indexing

This topic contains guidelines and best practices for configuring and using summary indexing.


General guidelines for summary indexing

Note: Currently, indexing events in a summary index counts against your license volume. We recommend that you not index more events in your summary indexes than you really need. Consult Splunk support for specific information on license volume impact.


Use summary indexing to:


  • capture rare events in a smaller index for efficient reporting.
  • build rolling reports or calculate running totals of aggregated statistics.

When using summary indexing:


  • Ensure that aggregated statistics generated from results in a summary index are accurate by indexing statistics taken from the smallest possible time range. For example, if you need to generate hourly/daily/weekly reports, then you want to index hourly reports in the summary index and generate daily and weekly reports from an aggregate of the hourly reports.
  • Be sure to set the proper periods and delays to scheduled searches you put in a summary index to minimize data gaps and overlaps.
  • Modify your reporting searches to use summary index data instead of original (main) index data when possible.
  • Use the Documentation:preview:SearchCommandAddinfo:latest search command to preview what events will look like if you summary index them.

Aggregated statistics

Be careful when building reports made of aggregated statistics. Some aggregating statistical functions (such as distinct count, mode, median, etc.) yield incorrect results when you use them on aggregated statistics. Use one of Splunk's reporting commands to access statistical functions.


For example, if you want to build hourly/daily/weekly reports of average response times, generate the "daily average" by averaging the "hourly averages" together. The daily average becomes skewed if there aren't the same number of events in each "hourly average". Get the correct "daily average" by using a weighted average function.


Example:


The following expression calculates the the daily average response time correctly (a weighted average) using stats and eval.


| stats sum(hourly_resp_time_sum) as resp_time_sum, sum(hourly_resp_time_count) as resp_time_count | eval daily_average= resp_time_sum/resp_time_count | .....Search

Gaps and overlaps

Gaps

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can occur if:


  • splunkd goes down
  • the scheduled saved search (the one being summary indexed) takes too long to run and runs past the next scheduled run time. For example, if a scheduled search is scheduled to run every 5 minutes but the search takes 7 minutes to run, the search won't run again if it's still running from the last time.

Overlaps

Overlaps are events in a summary index (from the same search) that share the same timestamp. Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if you set the time range of a saved search to be longer than the frequency of the schedule of the search, or you run summary indexing manually (using | collect).


Identify gaps and overlaps in data

Identify overlaps and gaps in a summary index using the "Summary Index Gaps and Overlaps" form search (a default saved search in the main Splunk dashboard), or by using the Documentation:preview:SearchCommandOverlap:latest command in your search (add | overlap at the end of the search that produces overlaps).


If you run the form search Summary Index Gaps and Overlaps, specify the time range using the form, or switch to a "text" display where you must specify the following parameters in the search bar (following | overlap):


either specify:


  • StartTime: Time to start searching for missing entries, starttime= mm/dd/yyyy:hh:mm:ss (e.g. 05/20/2008:00:00:00).
  • EndTime: Time to stop searching for missing entries, endtime= mm/dd/yyyy:hh:mm:ss (e.g. 05/22/2008:00:00:00).

or:


  • Period: Specify the length of time period to search, period=<integer>[smhd] (eg. 5m).
  • SavedSearchName: Specify the name of the saved search to search for missing events with savedsearchname=string (NO wildcards).

If you identify a gap, you can run your scheduled saved search over the period of the gap and summary index the results (using | collect). If you identify overlapping events, you can manually delete the overlaps from the summary index by using the search language.

Revision: 207 Contact Privacy Policy Terms of Use Community content licensed under Creative Commons