What is it doing?

Up here in SupportLand, I get a lot of questions about how to understand the various bits of information that Splunk itself is tracking. The past couple of versions have added several new things to make it easier to see what is going on. Here are some of the things you can look at.


New in 3.2, the audit.log records who did what based on what capability was requested from the authorization system. It shows both user-initiated actions like login and automated actions like running saved searches.

07-14-2008 10:59:09.434 INFO AuditLogger – Audit:[timestamp=Mon Jul 14 10:59:09 2008, user=admin, action=login attempt, info=succeeded][n/a]

Running a script
07-14-2008 10:59:12.542 INFO AuditLogger – Audit:[timestamp=Mon Jul 14 10:59:12 2008, user=admin, action=run_script_sendemail, info=granted ][n/a]

Dispatch search
07-14-2008 14:43:39.619 INFO AuditLogger – Audit:[timestamp=Mon Jul 14 14:43:39 2008, user=admin, action=search, info=granted dispatch maxtime=0 maxresults=100 [search sudo | eval sizeof=length(host) ] | outputcsv][n/a]

REST request
07-15-2008 08:21:33.576 INFO AuditLogger – Audit:[timestamp=Tue Jul 15 08:21:33 2008, user=admin, action=search, info=granted REST: /search/jobs][n/a]


These are the LicenseManager event that used to be reported in splunkd.log, now they are in their own file. The things to pay attention to are quotaExceededCount (number of license violations,) peak (all-time high daily volume) and todaysBytesIndexed. rolloverCount is the number of rollovers since last cleanUsually there is one event generated a day, just after midnight, but there can be others if the instance has been restarted.

07-15-2008 00:01:38.456 INFO LicenseManager-Audit – Audit:[timestamp=1216105298 quotaExceededCount=0, lastExceedDate=0, peak=14699861, rolloverCount=1, totalCumulativeBytesAtRollover=14699861, todaysBytesIndexed=14699861][Jls7bqb2G3dcwAgzAmi0P5pmJn1+IgDwMpoxmW1idMGbA1IlW2amr8tYq5ROlL3bysBxpCV46OEBCt3MJxjI73VvmGSWffU5C+1K3UXYejOLBdinoRavtk+hgLil69eF4n/vQ2mVixK179iHVkzckUcUe8X8iz8qPZT6BEvFhh0AukKlk6IFCrXWRftYysMEIR0IAmcuns7PWBzo/FmEOdm9rBKfVnNMKSvvos39QVooj4O6Km2+xsMUododll8w9IMrl9l0dDHW4AhfZfEN7Sf8krE1c/T/Q+VAxMRgzB0iqJWIddtIxgp6pmdBzD2q7dk9L2pAbkjzDlXRM5GyAg==]


I’ve talked about this one before, when trying to identify high volume data inputs. New for 3.3, in addition to the default 10 items per period you can configure how many items are reported in metrics.log by setting maxseries in limits.conf. (See limits.conf.spec for details.) Making this number larger will impact performance, but you can do it for investigating a specific issue. Or you can reduce it also. As before, it’s a sample of the top n items for each group in a 30 second period. So if you have 200 sources, you won’t see all your data inputs here. We are already talking about what metrics we can report, so in 4.0 expect to see new options.

Track blocked queues by looking for “blocked!!”:
06-24-2008 09:22:08.792 INFO Metrics – group=queue, name=parsingqueue, blocked!!=true, max_size=1000, filled_count=21, empty_count=0, current_size=1000, largest_size=1000, smallest_size=908

See which processors are actively running:
07-09-2008 14:03:43.876 INFO Metrics – group=pipeline, name=parsing, processor=utf8, cpu_seconds=0.321082, executes=90770, cumulative_hits=218992256

Diagnostic searches with CLI dispatch

The new dispatch search allows searching across many more events than the older search command. From the CLI, you can use the dispatch command or write something that uses the REST API. Particular searches can tell you more than just returning events. Dispatch from the CLI is particularly suited to this as it’s designed for reporting across huge sets of events (although not to return those hundreds of thousands of events.) It may take a while to run, but it will complete.

How many events?
./splunk dispatch “sourcetype=access_combined | stats count”
./splunk dispatch “sourcetype=access_combined starttime::04/25/08:00:00:00 | stats count”

How big are they?
./splunk dispatch “host=foohost1 | eval sizeof=length(_raw) | stats sum(sizeof)”

How big are various other things?
./splunk dispatch “sourcetype=syslog | eval sizeof=length(host) | stats avg(sizeof)”

Note that all of these use additional search commands to report on the set of events rather than the events themselves. Actual results returned from dispatch via the CLI are maintained in memory, so trying to get back thousands of events or more can cause serious problems. Don’t do it.

Posted by