TIPS & TRICKS

The Splunk Python client library (part 1)

Splunk 3.2 introduces a publicly available Python client library that allows external developers to programmatically interact with Splunk by importing a few key modules.

The easiest way to get started with the client library is to get into Splunk’s Python environment. Locate your Splunk install directory (/opt/splunk by default), and start the python interactive shell that comes with Splunk:

# bin/splunk cmd python

This will launch the interactive Python prompt, which starts off looking like this:

Python 2.5.1 (r251:54863, Nov 18 2007, 16:13:41)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Starting a search

Import the Splunk modules:

import splunk.auth
import splunk.search as se

If you have installed Splunk with the default settings, then your hostpath is https://localhost:8089. The client library knows this default, so you can authenticate directly by providing a username and password:

key = splunk.auth.getSessionKey('admin','changeme')

The getSessionKey method automatically caches the session key in the current interactive session, so you don’t have to pass it along to subsequent methods. In a production implementation, or if you are connecting to multiple servers, you’ll need to keep track of separate session keys.

If your server is on a different hostname or port, then you need to first update the session defaults:

splunk.mergeHostPath('splunk_hostname:12000', True)
key = splunk.auth.getSessionKey('admin','changeme')

The mergeHostPath method takes host information in many different forms:

  • hostname
  • hostname:port
  • https://hostname
  • http://hostname:port

Next, start a search:

job = se.dispatch('search error')

This creates a search job handle object job and start a running search on the server for events that contain the term “error”. If you are connecting to multiple servers, then you’ll also need to provide hostPath and sessionKey parameters as well. This handle is keyed off of the search job ID that is generated by the server, and is available via:

job.id

With this ID, you can always use your web browser to check on the status of a particular job by opening up:

https://localhost:8089/services/search/jobs/12345

where 12345 is the ID that you just generated.

There are a few properties on the SearchJob object that will be of immediate use:

  • job.isDone – a boolean value that indicates if the search has completed
  • job.count – the number of events that have been matched against the search
  • job.cursorTime – the current position of the search cursor; when dispatching a search, the cursor moves in a reverse chronological order

Working with search results

The raw events are the original event data that were indexed by Splunk, according to the data input rules. They are available as an interable container object:

job.events

This object works just like a list, and you can iterate and slice it to obtain events. The events are stored in reverse chronological order.

for x in job.events:
print x

This code will iterate over every event returned in the search and print out the raw text, which could be every event in your index if you so choose. The iterator will begin returning data as soon as it receives the first event, and will continue until the isDone property is True.

You can also retrieve specific rows of data using the standard python slice operator:

job.events[2] # returns the 3rd event in the search results
job.events[2:10] # returns events 3 through 10 as a list
job.events[-1] # returns the last event in the results

The items returned by iterating or slicing are actually Result objects that have additional properties:

  • job.events[0].raw – the raw event text (the same value as print job.events[0])
  • job.events[0].time – the event timestamp, as a datetime.datetime object
  • job.events[0].fields – a dict of all the fields associated with the event

For example if you wanted to see the host field for an event:

job.events[0].fields['host']

Or if you wanted to see all of the host entries for each event:

for x in job.events:
print x.fields['host']

Or alternatively, in shorthand:

for x in job.events:
print x['host']

If you want to print out a human-readable timestamp for events that came from the ‘firewall’ sourcetype:

for x in job.events:
if x['sourcetype'] == 'firewall':
print x.time.ctime()

When you are finished with the search job, remove it from the server by calling:

job.cancel()

Otherwise, the job will persist on disk until the specified timeout (TTL), which is 24 hours by default.

Splunk
Posted by

Splunk

Join the Discussion