Indexing and Searching RSS feeds

Many companies produce RSS (Really Simple Syndication) feeds for their employees, partners, and customers. Moreover, these same companies consume RSS feeds from their suppliers whether it be personal news information or more timely business data. RSS is a great way to digest this information, but after a certain period, it may not be possible to find it again. If information from a RSS feed were indexed on a regular basis, say every 10 minutes to 30 minutes, into Splunk it could be searched at anytime. To accomplish this, I’ve created a simple Splunk application to index some RSS metadata (date, title, link, and description) on Splunkbase. Simply download the application and install it into your $SPLUNK_HOME/etc/apps directory. Then, modify its inputs.conf file. For example:

interval = 600
sourcetype = rssfeed
source = rss_sports
disabled = false

Next, create a script in the rss/bin directory that is called by the scripted input. A sample one has already been provided as follows:


python $SPLUNK_HOME/etc/apps/Info/bin/ $SPLUNK_HOME/etc/apps/Info/bin/sports.txt

The script calls an already written Python script passing in one argument which contains a list of RSS feeds to index. Restart Splunk and look for your rssfeeds sourcetype. The RSS metadata has already been delimited by tag=”value” for automatic field extraction. The provided Python script calls open source feedparser to do the parsing of each RSS feed supplied to it. Since this is all script based and re-entrant code, you can provide multiple scripts in inputs.conf, each eventually calling with its own set of feeds to simultaneously index multiple sources.

The next step is to search the Splunk for information within a feed. Here’s an example screenshot using Splunk 4.0.x.

Splunk Web showing RSS Content

As seen on the left, fields have automatically been extracted. You can even set up alert conditions such as search for:

sourcetype="rssfeed" title="*inflation*"

For this example, Splunk will provide an alert for any feed event that has inflation in its title. As you can see, this capability provides the Splunk user with a powerful way to create an information base on any subject for future search.

Nimish Doshi
Posted by

Nimish Doshi

Nimish is Director, Technical Advisory for Industry Solutions providing strategic, prescriptive, and technical perspectives to Splunk's largest customers, particularly in the Financial Services Industry. He has been an active author of Splunk blog entries and Splunkbase apps for a number of years.