TIPS & TRICKS

Auto host resolving in splunk using python

This only works in 2.0.x
Ok so I’ve had a couple of people ask me how to resovle the ip addresses in their syslog files to their hostnames in splunk.
There’s no way to do this just by tweaking a config variable .. we need to dig a little deeper under the surface. It’s actually pretty easy to get splunk to call out to python during event processing so I’ve used that functionality to solve this problem.

Note that this will negatively impact indexing performance but it should work until we get this behavior baked into splunk.

First up I’ve created a python script that calls socket.gethostbyaddr to resolve the hosts. It will also cache the results so that the performance hit for dns misses is reduced.
So copy and paste the following into your favorite editor and save it to <SPLUNK_HOME>lib/python2.4/site-packages/splunk/pyHostNameResolve.py . This directory is where the dynamic loaded python will look for scripts; the filename will be referenced later in a config change.


#Copyright (C) 2006 Splunk Inc. All Rights Reserved. This work contains trade
#secrets and confidential material of Splunk Inc., and its use or disclosure in
#whole or in part without the express written permission of Splunk Inc. is prohibited.

from pipeline_data import PipelineDataWrapper #This is a virtual module/class that gets inserted into the python namespace at runtime by splunk
import traceback
import socket

#Set global variables
HOST_KEY = "MetaData:Host"

HOST_RESOLVE_MAP = {} #cache so we don't have to call gethostbyaddr ( expensive ) every event

def resolveHost( pdata, confDictString ):
    global HOST_RESOLVE_MAP
    try:

        host = pdata.get(HOST_KEY)

        resolvedHostName = None

        if host.startswith("host::") :
            host = host[6:]

        if host in HOST_RESOLVE_MAP:
            resolvedHostName = HOST_RESOLVE_MAP[ host ]

        if not resolvedHostName:
            try:
                resolved = socket.gethostbyaddr(host)
                resolvedHostName = resolved[0]
                HOST_RESOLVE_MAP[ host ] = resolvedHostName
            except:
                HOST_RESOLVE_MAP[ host ] = host
                print "Could not resolve " + host
                return 1

        if resolvedHostName :
            pdata.put( HOST_KEY, "host::"+resolvedHostName )

        return 1    

    except:
        print "EXCEPTION !!"
        traceback.print_exc()
        return -1

Ok now open your <SPLUNK_HOME>/etc/myinstall/splunkd.xml and insert the following chunk of xml between the diskusageprocessor and the bytequotaprocessor in the indexerpipe pipeline :

               <processor name="hostnameresolver" plugin="pythonprocessor">
                                 <config>
                                         <scriptFilename>splunk.pyHostNameResolve</scriptFilename>
                                         <command>resolveHost</command>
                                         <pyContext>resolveContext</pyContext>
                                         <pyConfig><![CDATA[]]></pyConfig>
                                 </config>
                         </processor>

Ok now fire up splunk and you should start seeing your hosts getting resolved. Note that this will negatively impact performance but it should work until we get this behavior baked into splunk.
Cheers,
Brian

Brian Murphy
Senior Software Splunker
Splunk Inc.

Splunk
Posted by

Splunk