Documentation: 3.3.3
Print Version Contents
This page last updated: 09/19/08 10:09am

Search with the Python SDK

Make sure you have authenticated and gotten a session ID.

Create a search

Import necessary modules:

import splunk.search as se

Start a search:

foo = se.dispatch('search error')

Name your search anything. In this example, the search is called foo.

Note: If you are connecting to multiple servers, then you'll also need to provide hostPath and sessionKey parameters as well.

This starts running a search on the Splunk server for events containing the term error. This search is a job handle object called foo. This handle is keyed off of the search job ID that is generated by the server, and is available via foo.id.

A $JOB.id is a numerical value you can use in your web browser to check on the status of a particular job:

https://localhost:8089/services/search/jobs/12345

where 12345 is the ID that you just generated.

There are a few properties on the SearchJob object that will be of immediate use:

  • foo.isDone - a boolean value that indicates if the search has completed.
  • foo.count - the number of events that have been matched against the search.
  • foo.cursorTime - the current position of the search cursor; when dispatching a search, the cursor moves in a reverse chronological order.
  • foo.events - the raw events contained within your search.

Regex and Python

You have to be careful about escaping characters when working with regular expressions in Python,. The correct way to submit your original search is to identify the string as a raw string via the r'<string>' constructor:

splunk.search.dispatch(r'search index=mail sourcetype!=sugarstate startminutesago=1440 | rex "\"from\s+(?![^\.]+\.splunk\.[^\s]+)[^\s]+\s+\(\[(?<clientip>\d+\.\d+\.\d+\.\d+)" | where (clientip NOT LIKE "192.%") AND (clientip NOT LIKE "10.%") AND (clientip > "")')

Note that the string is prefixed with 'r', which follows the python convention for rawstring and unicode construction. See python regex documentation.

Now, in your Splunk searches:

search.dispatch('search foo | rex "this\nthat\"there"')

Python interprets the \n as a literal carriage return and the quote as escaped.

So Splunk registers your search as:

search foo | rex "this that"there"

Note your carriage return has become a space, the middle quote has become "hot", and the regex has become quote-unbalanced.

So you must mark your string as a raw string:

search.dispatch(r'search foo | rex "this\nthat\"there"')

Then Python will pass the string along unprocessed:

search foo | rex "this\nthat\"there"

Work with search results

Th foo.events object works just like a list, and you can iterate and slice it to obtain specific events. The events are stored in reverse chronological order.

for x in job.events:
    print x

This code iterates over every event returned in the search and prints out the raw text. The iterator begins returning data as soon as it receives the first event, and continues until the isDone=True.

You can also retrieve specific rows of data using the standard python slice operator:

  • foo.events[2] - returns the 3rd event in the search results.
  • foo.events[2:10] - returns events 3 through 10 as a list.
  • foo.events[-1] - returns the last event in the results.

The items returned by iterating or slicing are actually result objects that have additional properties:

  • job.events[0].raw - the raw event text (the same value as print job.events[0])
  • job.events[0].time - the event timestamp, as a datetime.datetime object
  • job.events[0].fields - a dictionary of all the fields associated with the event

For example if you wanted to see the host field for an event:

job.events[0].fields['host']

Or if you wanted to see all of the host entries for each event:

for x in job.events:
    print x.fields['host']

Or alternatively, in shorthand:

for x in job.events:
    print x['host']

If you want to print out a human-readable timestamp for events that came from the 'firewall' sourcetype:

for x in job.events:
    if x['sourcetype'] == 'firewall':
         print x.time.ctime()

When you are finished with the search job, remove it from the server by calling:

job.cancel()

Otherwise, the job will persist on disk until the specified timeout (TTL), which is 24 hours by default.

Examples

The following code authenticates, generates a search and returns a search ID.

from httplib2 import Http
from urllib import urlencode
import xml.dom.minidom as xml

# set variables
endpoint = 'https://localhost:8089'
authURI = endpoint + '/services/auth/login/'
jobURI = endpoint + '/services/search/jobs/'
authData = {'username': 'admin', 'password': 'changeme'}
headers = {}

# initialize our connection handler
h = Http()

# open a connection and do a POST for auth
resp, content = h.request(authURI, "POST", urlencode(authData))

# parse our token out of the response
xmlDoc = xml.parseString(content)
tokenElements = xmlDoc.getElementsByTagName('sessionKey')

if not tokenElements:
       print 'No session key found!  Are you running the free version?'
       tokenElements = xmlDoc.getElementsByTagName('msg')
       print 'Reason=%s' % tokenElements[0].firstChild.nodeValue
       headers['Authorization'] = ''
else:
       sessionKey = tokenElements[0].firstChild.nodeValue
       print 'sessionKey=%s' % sessionKey
       headers['Authorization'] = 'Splunk %s' % sessionKey

# set up our search job
postargs = { 'search': "search * hoursago=24" }
payload = urlencode(postargs)

# open a connection and do a POST for a new job
resp, content = h.request(jobURI, "POST", headers=headers, body=payload)

print 'server returned code %s.' % resp.status
print content

You should get a job ID returned:

server returned code 201.
>>> <?xml version='1.0'?>
<response><sid>1213220104.17</sid></response>

The following examples returns results from a remote server.

import splunk.auth
import splunk.search as se
import time

splunk.mergeHostPath('https://foo.example.com:8089', True)

splunk.auth.getSessionKey('admin', 'changeme')

job = se.dispatch('search sourcetype=access_common 404')

print job.isDone

for result in job: print result
Previous: Search Endpoint    |    Next: Custom search scripts

Comments

No comments have been submitted.

Log in to comment.