TIPS & TRICKS

Using Alerts to Send Data to Amazon S3

A customer recently asked me to prove a concept where Splunk could see a certain type of incoming event and then pass information from that event into their Amazon S3 storage. I knew that Splunk could create alerts for event conditions and then fire off a script when the alert triggers, but I had never made it work with Amazon S3.

I decided to implement this using Amazon’s Boto library for Python. There’s lots of good documentation on this library here, but the short of it is that it enables you to send data to a bucket on Amazon S3 programmatically through a Python script. As you may know, Splunk comes with its own Python implementation can easily run Python scripts as an alert response.

As a pre-requisite, you’ll need access to an Amazon S3 bucket, which will include an Access Key ID and a Secret Access Key. You’ll also want a user name and password so you can see the file being written to the bucket in the S3 web interface.

First you will need to get the Boto library installed on your system. There are several methods, but this worked well for me at the Linux shell on my Splunk server:

  • Navigate to the Splunk scripts folder ($SPLUNK_HOME$/bin/scripts)
  • git clone git://github.com/boto/boto.git
  • cd boto
  • python setup.py build (or ../../splunk cmd python ./setup.py build)
  • cd ..
  • cp -R ./boto/build/lib/ ./

Now you’ve installed the library and put it where it is accessible to the script you will create. You will need to get your Access Key ID and Secret Access Key, put them in a boto.cfg file, and put that somewhere that Splunk’s Python can access it. More details are here (http://boto.readthedocs.org/en/latest/boto_config_tut.html), but since I’m root on a test system I just put it into /etc/boto.cfg, with the content as follows:

[Credentials]

aws_access_key_id = <my_access_key_here>

aws_secret_access_key = <my_secret_key_here>

Next, I needed to create a script. This was an iterative process with lots of trial and error, but ultimately I created a file called s3push.py and placed it in my $SPLUNK_HOME$/bin/scripts/ folder. It looks like this:

#!/opt/splunk/bin/python

import os,sys,gzip,boto

 

results_file = sys.argv[8]

results = gzip.open(results_file)

content = results.read()

results.close()

 

s3 = boto.connect_s3()

bucket = s3.get_bucket(‘mybucket’)

 

key_name = ‘S3file.txt’

path = ‘Splunk’

full_key_name = os.path.join(path,key_name)

 

from boto.s3.key import Key

k = Key(bucket)

k.key = bucket.new_key(full_key_name)

k.set_contents_from_string(content)

————-

Here’s the breakdown of this script:

#!/opt/splunk/bin/python

Points us to the correct Splunk interpreter (ensure you use the correct path for your installation).

import os,sys,gzip,boto

Brings in the required libraries, including Boto. Since the boto directory is in the directory with my script, I don’t need to do anything fancy to make it work.

results_file = sys.argv[8]

results = gzip.open(results_file)

content = results.read()

results.close()

This section takes the results file that my alert will generate (sys.argv[8]), unzips it and dumps the content in a variable called “content”. It will contain the full results of whatever search I create in Splunk as the basis for my results file.

s3 = boto.connect_s3()

bucket = s3.get_bucket(‘mybucket’)

This opens the connection to my Amazon S3 bucket, creatively called “mybucket”. You can use the boto library to create buckets if you like, but my customer gave me a specific bucket and folder to use.

key_name = ‘S3file.txt’

path = ‘Splunk’

full_key_name = os.path.join(path,key_name)

This section buils the “full_key_name” variable we will use later to include both the file name I want to create (“S3file.txt”) and the folder name in which it will reside (“Splunk”). Folders are optional, but I was required to use one by my customer. You can just create a “key_name” and pass it instead of “full_content_name”  below if not using folders.

from boto.s3.key import Key

k = Key(bucket)

k.key = bucket.new_key(full_key_name)

k.set_contents_from_string(content)

Lastly, we inport the Key function from the Boto library, create an object handler for it, create the new key (file name) based on the path and file name in the previous section, and then push the data from our results file (in a string called “content”) to the key we creating.

————-

Now, all that remains is to create a Splunk search with results in a format we want to push up to Amazon S3, save it as an alert in Splunk, and put the name of this script from the actions menu. Ensure that the alert is firing in Splunk and generate results (visible in the Splunk Web UI). If you get it all right, you should see a file called “S3file.txt” appear on your S3 share with the results of your Splunk search.

If it doesn’t work, here are some troubleshooting tips:

– Test your script piece by piece (my first test was simply “import boto” to make sure I had placed the library in the correct directory, e.g.)

– When you test your script manually, make sure you use Splunk’s python like this: “<$SPLUNK_HOME$>/bin/splunk cmd python <script_name>” and NOT just “python <script_name>”

– You are not going to get access to the output file through “sys.argv[8]” if testing manually, so just use some other test file or string

Finally, be aware this this script is really just basic proof of concept. It is not in any way robust, production code. If you’re going to use this for something important you’ll certainly want error handling and more sophisticated handling of the data.  I’d love to hear about your improvements and experiences in the comments, so please share.

Happy Splunking!

----------------------------------------------------
Thanks!
Andrew Dauria

Splunk
Posted by

Splunk

Join the Discussion