TIPS & TRICKS

Splunk is a Verb – Splunking Perforce Data (Part 2)

This is a continuation of the article: Splunking Perforce Data

In it we were able to import change history from Perforce via the P4 CLI into Splunk, but had a problem. We could only input a range as a one shot operation (with the added complication that we’d get duplicate data upon restart), but what we want to do is to get continuous updates from P4, as well as be resilient to server restarts.

Using a Counter in a Scripted Input

A key pattern to introduce is to have a notion of a “counter.” The counter in this case is not a P4 counter (P4 has it’s own notion of counter that we are not using here), but rather a counter that indicates what we’ve indexed so far (to leverage the attribute of P4 that all change numbers increase in sequential order), so that if we are to index more data from P4, we’ll always start from this counter. Similarly, if the script is pulling the data from P4, and there is a crash, we always begin where we left off.

The trick is how to determine what the counter should be.

For this, we can write a method that does a search for the last change number via the Splunk CLI. This is an interesting design pattern of a scripted input, performing a search of the index as part of adding data to the index. The search would be “index=perforce-change | head 1 | stats max(change)”, and running at the commandline would yield this:

% ./splunk search "index=perforce-change | head 1 | stats max(change)"
max(change)
-----------
7999
%

With a little python magic, you can get the change number.

#
  # Run a splunk search to determine the highest p4 change number
  # that has been indexed in the splunk db.  Return 0 if the
  # db is empty
  #
  def getMaxIndexedChangeNumber(self):
    # construct a search command to run to get the highest change number in the db
    maxChangeSearch = self.splunkCLI+ " search \"index=" + self.splunkIndex + " | head 1 | stats max(change\
)\" " + "-auth " + self.splunkUser + ":" + self.splunkPassword
    #  FIXME: if self.maxChangeSearch returns error, error will go into the index
    maxChangeStr = os.popen(maxChangeSearch).read()
    if maxChangeStr.find("max(change)") != -1: # if result returned
      maxChangeStr = maxChangeStr.strip('max(change)')
      maxChangeStr = maxChangeStr.strip()
      maxChangeStr = maxChangeStr.strip('-')
      maxChangeStr = maxChangeStr.strip()
      return maxChangeStr
    return 0 # fresh index, start indexing from 0

We’ll modify the main method of the last blog to utilize this, plus another method to get the latest P4 change number. Thus, rather than scanning through a hard-coded range, we can get the range of changes from the last change in the splunk index, and the most recent change in the P4 repository.

  # Run p4 changes -m 1 to get the last change in P4.
  #
  def getMaxP4ChangeNumber(self):
    f = os.popen(self.p4CLI + " -G changes -m 1")
    d = marshal.load(f)
    if (d['code'] == 'error'):
      # something is very wrong if this fails, so lets complain
      printError("Error from P4 changes: " + d['data'])
    return int(d['change'])

Configuring the Script for an Interval

The script can then be run on a periodic basis to gather the latest changes by having your inputs.conf look like this:

[script://./bin/P4Simple.py]
interval =  10
sourcetype = perforce-change
source = eigenbase-p4proxy.splunk.com:1666
index = perforce-change
disabled = false

One worry I had was that the script would fire off while the previous was running. The nice thing to note is that the interval is only between script invocations. So if the script takes an hour to run, but your interval is 1 minute, it’ll run again, 1 hour and 1 minute from the first invocation.

Why Isn’t My Script Running!?!?

One digression. I must have spent an hour at least rechecking my configs and googling what I can find about Splunk and configs to figure out one thing. Why is it that my scripted input evidently was not being invoked? I thought perhaps I was missing enabling the app, or missing enabling the input. All things checked out. For instance, there is an internal tool called btool that is undocumented, but very useful for debugging.

First, I checked to see if Splunk knows about my script.

% ./splunk cmd btool --app=p4-simple inputs list
[script:///home/boris/splunk/etc/apps/p4-simple/bin/P4Simple.py]
disabled = false
index = perforce-change
interval = 10
source = eigenbase-p4proxy.splunk.com:1666
sourcetype = perforce-change
%

According to this, it does. But, no data was going into my index.

I even added a print to stderr in my main method so I should be able to see output in splunkd.log. That confirmed that it definitely was not being executed.

Then I thought, wait a sec. This app I have here is actually a copy of my previous app. And both are enabled on this Splunk instance. On top of that, I hadn’t changed the name of the script. Looking at my original app, I had “disabled = true” in the stanza for the scripted input. Could that be clobbering my “disabled = false” value even though these are in different apps?

Apparently yes!!

A real head-slapper.

The Beauty of BTool

Btool also illustrates this, had I known better about how to use it. Basically, Btool, in addition to allowing you to see what is being defined by an App, it will display global properties as well. So I ran the following command:

./splunk cmd btool inputs list

This showed me that indeed only one P4Simply.py script was known, and it was from my old app.

[script:///home/boris/splunk/etc/apps/p4-simple-old/bin/P4Simple.py]
_rcvbuf = 1572864
disabled = true
host = hiro
index = p4-sample
interval = -1
source = perforce.eigenbase.org:1666
sourcetype = p4-sample

Note, how I’ve renamed several things, so it’s obvious that this is from my old app. Renaming it to [script://./bin/P4Simple.py.OLD] solved my problem, and with a reboot, I could see data coming through.

Another nice thing about btool. You can run it with Splunk not running in order to verify confs without bringing up the server.

So, lesson learned. Inputs have global implications and can clobber each other’s values. I already knew that was the case with file inputs, and should have known that was the case with scripted as well. A nice feature would be to warn when two apps conflict. (Splunk apparently takes precedence by the alphabetical order of the app name)

Another nice tip to confirm your script is being executed is to print to standard error. All standard error gets logged in splunkd.log as ERROR messages. It may or may not be what you want, but it’s useful, since I could find no other log message from Splunk itself that indicates when it executed a script.

08-14-2010 18:15:48.552 ERROR ExecProcessor - message from "python /home/boris/splunk/etc/apps/p4-simple/bin/P4Simple.py" Executing P4Simple

Setting Up Your Own Personal Conf File

One other interesting undocumented trick. Ideally you want to put various configuration values into a conf file so that you don’t have it hard coded in your script.

You can do this by doing the following import:

import splunk.clilib.cli_common

I can then create my own “p4-simple.conf” file and put that in default with the following:

[p4]
p4port = eigenbase-p4proxy.splunk.com:1666
p4user = guest
p4password =

[splunk]
user = admin
password = changeme
index = perforce-change

I then utilize Splunk’s classes to retrieve those in my setup method.

 #
  # setup values for Splunk and P4 CLI actions
  #
  def setupValues(self):
    # Splunk CLI - assume SPLUNK_HOME is set
    self.splunkHome = os.environ['SPLUNK_HOME']
    if self.splunkHome == None:
      printError("SPLUNK_HOME not set")
      sys.exit("This should not happen")
    self.splunkCLI  = self.splunkHome + "/bin/splunk"
    # retrieve values from p4-simple.conf file
    p4Settings = splunk.clilib.cli_common.getConfStanza('p4-simple','p4')
    splunkSettings = splunk.clilib.cli_common.getConfStanza('p4-simple','splunk')
    self.splunkIndex = splunkSettings['index']
    self.splunkUser = splunkSettings['user']
    self.splunkPassword = splunkSettings['password']
    self.p4port = p4Settings['p4port']
    self.p4user = p4Settings['p4user']
    self.p4password = p4Settings['p4password']
    # in case env has P4CONFIG set
    os.unsetenv('P4CONFIG')
    os.environ['P4PORT'] = self.p4port
    self.p4CLI = "p4 -u " +  self.p4user + " -P \"" + self.p4password + "\" "

This is a great way to have my own conf files for my scripted input. There is an improvement that can be done here, since I have the index name duplicated in both inputs.conf and p4-simple.conf and indexes.conf. It would be nice to consolidate them somehow, but there is currently no way of doing that.

Difference In Behavior of Real-Time vs Non-Real-Time Searches, and the Danger of Debug Messages

Another interesting curiosity happened when I was testing if data was going into the index by running the same search I use in my script to get the latest counter “index=perforce-change | head 1 | stats max(change)”. Oddly, I kept getting no results. I went to the web UI and did the same search, but got results. As it turns out, in the Web UI I had my time range be Real-time search for all time. When I simply did “index=perforce-change | head 1” from the commandline, I got my answer of what the problem was. I had a debug message from my script (I had outputted a word to standard out, which Splunk timestamped as current time). Since that debug message obviously had no field “change”, it resulted in an empty result. Whereas in a real-time search “head 1” is only the latest of the data received while the real-time search is running (not the data in the index), so since that debug message got into the index some time back, it didn’t show up.

Well, the good thing is that I can easily start again, since I followed the best practice of using a separate index and can simply clean the index of all data, and re-import via the script.

Timestamping Headaches

And it’s good I can do this, since it seems Splunk doesn’t recognize timestamps older than 2000 days ago (I also discovered the problem with the same search “index=perforce-change | head 1”). Since this P4 repository has been around since 2002, it results in this chart. The data hasn’t completed indexing yet, but without it completing I can tell something is off. The graph should be filling in sequentially, but there’s a huge gap.

Bad timestamping of older events

Bad timestamping of older events

Look at the first event listed. It has an obvious timestamp of 5/29/02, however it is being set to the current date, 8/14/10, though the time of day is nicely preserved (ignore the banner error message, a side-effect of something else I’m working on). Adding the following to props.conf solves the problem.

MAX_DAYS_AGO = 100000

This is a heavy hammer, which says accept dates up to 100,000 days old from all sources. Don’t do this unless you know you want older timestamps. For a few more details on how timestamping works, see Guess What Time It Is.

A key lesson learned here is that since timestamping happens at Index time, your index can be polluted with junk data very quickly if this isn’t done right.

----------------------------------------------------
Thanks!
Boris Chen

Splunk
Posted by

Splunk

Join the Discussion