Tips & Tricks

January 02, 2011

2 Minute Read

Did I miss Christmas?

By Splunk

I’ve had this script kicking around for a while now, but never get around to publishing it… in the interest of getting it done, this post will be brief.

You may be aware that in Splunk 4.1, we introduced a completely rewritten Tailing Processor (the component that handles file monitor inputs). The rewrite included a prototype REST endpoint that provides realtime status of the Tailing Processor’s activities. It can be seen at https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus (on a stock installation), but quickly becomes unreadable with a large number of files being monitored.

The script (linked below) summarizes all of the entries at the endpoint, as such:

Some quick details about the output:

Updated: when the status was last fetched, as well as how long the fetch/parse took.
Dirs seen: number of directories the Tailing Processor knows about, whether ignored or monitored.
Finished files: number of files that were fully read, and whose file descriptors have since been closed.
Reading/open files: currently open files that are not at 100% completion yet. Also includes 100% completed files whose descriptors are still open temporarily (consider this to mean “files we’re just about done with”).
Ignored items: files or directories that have been scanned, but not read. As listed in the screenshot, this can mean files that the splunkd process doesn’t have permissions to read, files that are considered binary, etc.

Things can look a bit more interesting if you catch a large file in progress. Here, we have a ~1GB file at 10% completion – as the tool refreshes, this percentage will adjust accordingly:

Usage:

Simply run the script through the Python interpreter included with Splunk (you cannot use your system Python).
In the commands below, replace the bolded portions with appropriate paths for your installation:

Unix: /opt/splunk/bin/splunk cmd python /path/to/fileStatus.py
Windows: c:\program files\splunk\bin\splunk cmd python c:\temp\fileStatus.py
Accepts “-interval #” where # is an integer. This sets how many seconds the script will wait before refreshing the endpoint. Defaults to 1.
Accepts “-clear true|false”. If false is passed in, the terminal will not be cleared before refreshing the endpoint. This can be used to track long-term behavior of the Tailing Processor. Defaults to true.
Accepts “-uri <uri>” to allow for pointing the script at another Splunk instance (see my other posts).

Other notes:

Remember that the endpoint is a prototype, and thus has minor bugs – but you can basically trust it. For example, sometimes you’ll see a file completion percentage larger than 100% – this just means the file keeps growing. Eventually it will be labelled 100% again. As Deep has been known to say, “that <stuff> happens, be cool about it.”
The script can work with a very large number of files being monitored, but the current implementation will chew through RAM. The largest I tried was 450,000 files, which took up a couple of gigs of memory.
If you look at the main() implementation in the file, you’ll see a hacky way to create your own python Splunk CLI command, taking advantage of the CLI’s auth features and whatnot. Not that you would need to, but I think it’s cool.

Download: here (md5sum: 217418d8c1a88632a6d28685ee28e7c9)

Better late than never, yes?

----------------------------------------------------
Thanks!
Amritpal Bath

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

Tips & Tricks 6 Min Read

Search commands > stats, chart, and timechart

Differences between stats, chart, and timechart when you specify a BY clause

Tips & Tricks 3 Min Read

Forecasting at Scale: How to Process Millions of Time Series using Prophet and DASK

How do you scale out a specific forecasting use case for millions of entities? Splunker Philipp Drieger gives you the low down and shows you how it's done with the help of DASK and Prophet.

Tips & Tricks 2 Min Read

Relating Add-ons to CIM

Splunk App for ES has Content Profile Audit dashboard that compares knowledge objects to data models, which Add-ons prepare data for; Reports use REST query.

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

Did I miss Christmas?

Usage:

Other notes:

Download: here (md5sum: 217418d8c1a88632a6d28685ee28e7c9)

Related Articles

Search commands > stats, chart, and timechart

Forecasting at Scale: How to Process Millions of Time Series using Prophet and DASK

Relating Add-ons to CIM

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram

Did I miss Christmas?

Usage:

Other notes:

Download: here (md5sum: ﻿﻿217418d8c1a88632a6d28685ee28e7c9)

Related Articles

Search commands > stats, chart, and timechart

Forecasting at Scale: How to Process Millions of Time Series using Prophet and DASK

Relating Add-ons to CIM

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram

Download: here (md5sum: 217418d8c1a88632a6d28685ee28e7c9)