This script can be used to backfill a summary index with archived data. This is a simple python script to break up a search into intervals and feed the summary index using the CLI. The script includes the option to utilize the dispatch API for summarizing large amounts of data.
Note: This document refers to 3.x versions of Splunk. For information about managing backfill in Splunk 4.x, see "Manage summary index gaps and overlaps" in the Knowledge Manager manual.
For a general overview of summary index functionality, see "Use summary indexing for increased reporting efficiency" in the User manual.
To use the script, change the variables in the top section of the script to suit your needs.
import os,datetime
# Purpose: Execute summary index searches on archived data via Splunk CLI. This version
# provides the option to use the dispatch API instead of the vanilla search command.
# The dispatch API can be used in cases where it is simply not possible to work
# within the max results limit.
#---------- change these variables ----------
splunkSearch = "sourcetype=foo | stats count by host | addinfo | collect index=summary"
startDate = "04/13/2008"
startTime = "00:00:00"
endDate = "04/17/2008"
endTime = "00:00:00"
intervalInMins = 10
# default maxresults for CLI searches is 100
maxResults = 50000
# enable dispatch API when maxResults is simply too small
# set maxOut as appropriate, but the default 100 should be ok
useDispatch = True
maxOut = 100
#---------- begin script ----------
# break down the start/end date and time
startDateTokens = startDate.split("/")
startMonth = int(startDateTokens[0])
startDay = int(startDateTokens[1])
startYear = int(startDateTokens[2])
startTimeTokens = startTime.split(':')
startHour = int(startTimeTokens[0])
startMin = int(startTimeTokens[1])
startSec = int(startTimeTokens[2])
endDateTokens = endDate.split("/")
endMonth = int(endDateTokens[0])
endDay = int(endDateTokens[1])
endYear = int(endDateTokens[2])
endTimeTokens = endTime.split(':')
endHour = int(endTimeTokens[0])
endMin = int(endTimeTokens[1])
endSec = int(endTimeTokens[2])
# initialize start and end dates/times
startDate = datetime.datetime(startYear,startMonth,startDay,startHour,startMin,startSec)
endDate = datetime.datetime(startYear,startMonth,startDay,startHour,startMin,startSec)
endDate += datetime.timedelta(minutes=int(intervalInMins))
finishLineDate = datetime.datetime(endYear,endMonth,endDay,endHour,endMin,endSec)
# generate and run splunk search commands via CLI
i = 0
while (startDate < finishLineDate):
# if near the finish line, set endDate = finishLineDate
if (endDate >= finishLineDate):
endDate = datetime.datetime(endYear,endMonth,endDay,endHour,endMin,endSec)
# convert date/time format to MM/DD/YYYY:HH:mm:ss
startTime = startDate.strftime("%m/%d/%Y:%H:%M:%S")
endTime = endDate.strftime("%m/%d/%Y:%H:%M:%S")
searchCmd = "starttime=\"" + startTime + "\" endtime=\"" + endTime + "\" " + splunkSearch
# run it!
if (bool(useDispatch)):
searchCLI = "splunk dispatch \"" + searchCmd + "\" -maxout " + str(maxOut)
else:
searchCLI = "splunk search \"" + searchCmd + "\" -maxresults " + str(maxResults)
print "Executing [" + searchCLI + "]"
result = str.split(os.popen(searchCLI).read())
print result
# increment start and end dates by intervalInMins
startDate += datetime.timedelta(minutes=int(intervalInMins))
endDate += datetime.timedelta(minutes=int(intervalInMins))
# track number of searches run
i += 1
print "Done running " + str(i) + " searches!"
Looking at the scenario on the summary indexing page, the following settings would be used with this script to back fill the first 14 days of August 2008 for the Do Not Click - Summary Index - Firewall Daily Summary Source IP search.
splunkSearch = "eventtype=firewall | stats count by src_ip | sort count desc | head 200 | addinfo | collect addtime index=summary marker=report=firewall_daily_summary_src_ip" startDate = "08/01/2008" startTime = "00:00:00" endDate = "08/15/2008" endTime = "00:00:00" intervalInMins = 1440