Tips & Tricks

August 23, 2007

3 Minute Read

Ripping mulitline events at seach time

By Splunk

I relaized that as part of the previous monitoring bundle post i forgot to explain something cool/critical.

When we first conceived of the scripted inputs we used ps, top, netstat, as examples. It was going to be so easy and cool to eat ps output and get graphs of VM usage by process. Totally obvious until we tried it. The ps output in splunk works best as one event, with the header at the top and a repeated line per process:

( click to enlarge )

Looks great! I can search for “sourcetype::ps splunkd” and get back all the times splunkd was running. But the problem comes when wanting to report on VM usage. How do i get our kv extractor to support a search that is “average VSZ for splunkd by time”. In our search langauge ( or using the UI ) you can say something like:

"sourcetype::ps splunkd | stats avg(VSZ) by _time

What we want is to produce a table and graph that is the average value for the key “VSZ” over time for just the one the process “splunkd”. But the above wont do that as there is no VSZ = in the “event” and worse than that, there are many values drawn out in the VSZ column.

The problem is that we have made ps output into one event with the header and multiple rows per process. This is good as it keeps logically together all the ps output and makes it look like the output from the command. We could have split it up on input but that would be really ugly with each line just a row of data. Even if we duplicated the header for each row it would have been sucky.

We were confronted with how to access individual row/column values withing this type of output – keep in mind that many OS commands operate this way : top, lsof, ls -las, …

In usual splunk fashion we did not want our sever to have specific knowledge on how to process top, ps, etc, output.

Enter Steve Zhang – i think it was his first week here at splunk and he just nailed it.
He wrote the multikv search processor that would find a header and rip events like this into single line events with kv pairs generated for each col/row. SWEET!

It works something like this – its takes the following type of event ( header(col) + multiple lines(rows)):
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 41 0.0 -0.0 28324 680 ?? Ss 8:25PM 0:00.01 /usr/sbin/memberd -x root 42 0.0 -0.1 29268 2208 ?? Ss 8:25PM 0:00.18 /usr/sbin/securityd root 44 0.0 -0.0 27864 484 ?? Ss 8:25PM 0:00.62 /usr/sbin/notifyd root 46 0.0 -0.1 30988 2976 ?? Ss 8:25PM 0:00.61 /usr/sbin/DirectoryService root 48 0.0 -0.0 27676 876 ?? Ss 8:25PM 0:00.37 /usr/sbin/distnoted root 51 0.0 -0.0 27252 240 ?? Ss 8:25PM 0:07.55 /usr/sbin/update root 62 0.0 -0.1 37828 2020 ?? S 8:25PM 0:00.12 /usr/sbin/blued

And at search time if you pipe as result set through multikv turns it into something that looks like the following where each line is a seperate event:
USER=root PID=41 %CPU=0.0 %MEM=0.0 VSZ=28342 VSZ=680 TT=?? STAT=Ss STARTED=08:25PM TIME=0.00.01 COMMAND=/usr/sbin/memberd -x - USER=root PID=42 %CPU=0.0 %MEM=0.1 VSZ=29268 VSZ=2208 TT=?? STAT=Ss STARTED=08:25PM TIME=0.00.18 COMMAND=/usr/sbin/secuirtyid -x - USER=root PID=44 %CPU=0.0 %MEM=0.0 VSZ=287864 VSZ=484 TT=?? STAT=Ss STARTED=08:25PM TIME=0.00.62 COMMAND=/usr/sbin/notifyd -x - and so on…

This means that by taking the original search and pipeing it to multikv we can report on any row/column pair – so now do the following search
sourcetype::ps splunkd | multikv | stats avg(VSZ) by _time

The problem with the above is that multikv is going to breakout every row and thus the avg(VSZ) above will be the average for ALL processes. Think of multikv turning ps output into a table of rows and columns. So, we need the ability for multikv to not break every ps entry into its own event, just the splunkd ones. No problem, enter Dr. Zhang. Multikv was enhanced to support both filter and field so that you specify which rows and columns you want extracted.

By adding the following filter argument multikv will take the entire ps event and pull out just the splunkd row.
sourcetype::ps splunkd | multikv filter splunkd | stats avg(VSZ) by _time
The filter argument to multikv will filter out all rows that do not match – you can supply multiple coma separated. For example following will will pull out all rows that have splunkd OR python:
sourcetype::ps splunkd | multikv filter splunkd, pythond | stats avg(VSZ) by _time

So now we will get just rows with splunkd or python but we will still get all fields (columns). In some cases you don’t want/need all columns. Just as the filter argument helped control which rows we wanted, we can limit columns using the fields argument. The following will pull out just the RSS and VSZ fields (columns) for splunkd and python entries in ps
sourcetype::ps splunkd | multikv filter splunkd, python fields RSS, VSZ

So to get just the splunkd VSZ over time we can just do the following: ( click to enlarge )

( click to enlarge )

The above focused on explaining how to work with by filtering out just the splunkd process. It can be very intereting to look at all process by time.
If we wanted to, we could easily use multikv along with timechart to graph all processes vm usage over time. Just remove the filter argument and have timechart split by host. This done easily enough via the UI:

Search for “index::monitoring sourcetype::ps | multikv”
Click the report link to go to report mode
Click on the VSZ field
Change the popup/builder to average by time split by host

Viola!

( click to enlarge )

So, if your using the monitoring bundle or have your own header, row, col events used Dr. Z’s multikv!!!

**as usual, please, please, please, send/post/call with suggestions on how to improve**

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

Tips & Tricks 1 Min Read

Zillow developing on Splunk

Splunk Enterprise abilities extended on Splunk Developer platform for custom search commands & solutions using Search Processing Language (SPL); demo fm Zillow.

Tips & Tricks 3 Min Read

Making service desk relevant within your organization

NOC/SOCs can’t see all their data stuck in silos; Splunk solves this, provides service desks analytics to track incidents, measure services, perform drilldowns.

Tips & Tricks 4 Min Read

analytics.usa.gov Recreated Using Splunk

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

Ripping mulitline events at seach time

Related Articles

Zillow developing on Splunk

Making service desk relevant within your organization

analytics.usa.gov Recreated Using Splunk

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram