Okay, so possibly not so much for profit, but I have developed a really cool IRC bot that logs to Splunk and provides a web interface for searching those logs. Several weeks before I started with Splunk, I was winding down my old job and looking for something I could do that was Splunk related. I’m a total IRC geek. I’ve been hanging out on #macintosh on Undernet for over 17 years, and I discovered while browsing Splunk’s website that there was an official IRC channel for Splunk, #splunk on EFnet. I spent the first week lurking and slowly introducing myself to the regulars, and then since I had some downtime, the idea occurred to me that it would be really fun to take the traffic from the IRC channel, log it into Splunk, and provide a web GUI to search that traffic.
First things first, lets get some data
So, before I could develop the awesome Web GUI for displaying IRC logs that was lurking in my head, I needed to get data in. To do that, I needed a resident IRC program which would take all the IRC protocol traffic and translate it into a format that would easily allow Splunk to provide field extractions. Luckily, Node has a rich environment where the community contributes back open source libraries which are easily installed via a package manager called npm. In npm, I found an excellent IRC library that cut down development time significantly. The IRC protocol is relatively trivial to implement, but still, having someone build it and prove it in advance is certainly easier.
Once I completed the ability to read IRC properly, it was time to get the data into Splunk. Since I work with both Splunk Enterprise and Splunk Storm, I implemented logging for both. For Enterprise, I log via a syslog-like TCP connection, and for Storm I log via a REST based output. For both, I log in JSON format. Originally, I logged in an autokv friendly key=value format, but since I’m logged user generated data, it was very difficult escaping for quotes and commas and autokv didn’t do well with escaping. Since JSON’s pickling and depickling accounts for escaping, it was easiest to log in JSON format. Here’s a screenshot from Storm showing how the data shows up in Splunk:
You can find the library for logging to storm here, and it can easily be adapted to log to Splunk’s receivers/simple REST endpoint as well (they are basically identical). This is very nice because you can log straight to a Splunk indexer and specify index, host, source and sourcetype when you log without having to configure Splunk to assign them. Also, here is the logging library which logs syslog from node to Splunk in case that might be useful for your project.
What can we learn
With the data in Splunk, what can we learn? I set out to create some simple dashboards that told us some data we didn’t previously have about the IRC channel:
The first widget is obviously a very simple timechart of activity. It told us something those of us who frequent the channel already know, that we’re primarily active during work hours. The other two charts are more interesting though. They tell us who is the most active, using this search:
index=splunkbot sourcetype=splunkbot_logs | spath | search to=#* | top nick
Note spath, this is what allows Splunk to extract the fields from the JSON logged data. In most cases, you’ll probably want to specify which fields you want to extract using spath, but in this case I want to extract everything so I specify no fields. The second search tells us to look for only traffic destined for IRC channels (all IRC channels begin with #), and then give us the top nick.
The second widget gives us the most mentioned people on the channel. On IRC, while talking in a channel which often has multiple conversations going, it’s common to prepend your chat with the nickname of the person you’re addressing, like “Coccyx: I think you’re the most awesome Splunker ever!” This search looks for people’s nicknames in the channel text and builds a dashboard of the nicknames most referenced:
index=splunkbot sourcetype=splunkbot_logs | spath | search action=message | rex field=text mode=sed "s/://g" | rex field=text mode=sed "s/,//g" | makemv delim=" " text | mvexpand text | rename text as nick| join nick [ search index="*" sourcetype="splunkbot_logs" action=names | makemv delim=" " names | mvexpand names | rename names as nick ] | top nick
This search is pretty complicated, so its useful to break it down. The first two rex commands use mode=sed, which allows rex to replace text in a field based on a regular expression. Here, we’re replacing , and : which are commonly used appended to a nick to indicate we’re addressing them, as in the earlier example. Those will mess up our matches, so we want to delete all instances of them. The next command, makemv, takes the text field, which in our logs contains the text of IRC messages Splunkbot has logged, and turns it into a multi-value. This basically breaks the text up into tokens that we can match, which we hope to contain nicknames. Mvexpand takes a multi-value field, and makes a new event for every value in the multi-value field. Next we rename the text field to nick so that we’ll have the same name for the subsearch we’re about to run.
Now we find a command I don’t see often used, but is very powerful, the join command. The join command works like you’d expect join to work in SQL, taking two searches and joining them together based off one or more fields which match. In this case, we want join to work like an inner join, refining our search, which due to the commands above contains an event for every word spoken on Splunk over the last 7 days, and then match against our subsearch. The subsearch references a particular log entry Splunkbot makes, which is the names entry, which it gets on a regular basis as people join and leave the channel. This gives a list of nicknames to match against. Finally we pipe the results of the join and subsearch to top to get the top referenced nicknames.