Alert with Splunk
You must have Flash installed and Javascript
enabled to view this video.
Description:
Will Hayes, Solution Architect, Splunk, provides an overview of Splunk's alerting features.
Download the movie to your computer by right-clicking here.
Date: Feb 27, 2008
Permalink
http://www.splunk.com/view/SP-CAAACGR
Transcript
Will Hayes
Alert Introduction
How's it going? I'm Will Hayes, a Solution Architect here at Splunk. I've been working with the company for about three years now. I joined in 2005 as the 12th employee in the Engineering Group. For the past eight months I've been working with this development team recruiting and helping partners who are building Splunk-powered solutions as well as embedding the Splunk daemon inside of their appliance.
Today we're going to walk through the alerting interface. I'm going to show you how to set up an alert using the UI. Then if you stick around I'm going to show some pretty cool, examples of how you can take an alert, take the results of that alert and interface with another system like Remedy to say create a trouble ticket automatically.
Alert Demo
Now that you've indexed and searched some data, I'll show you how to run any search on a schedule and trigger alerts and actions via email, RSS, SNMP or scripts. It's a great way to proactively find problems and monitor user or system activities across lots of different technologies.
Let's say a customer on your website reports a transaction failing. One of the searches you did to investigate the problem will make a great alert. It finds all the web failures that slipped through the cracks of your other monitoring systems. Let's go run it on a schedule and get Splunk to alert you when this kind of transaction fails again.
Click on the search menu and choose save. Type in the name you want to give the search. You can also share the search with other users. Its a great way to spread knowledge across your team.
Now set the schedule on which you'd like to run this search. Choose from predefined schedules or custom schedules.
Alerts can trigger based on a variety of conditions, thresholds and changes including the number of events, hosts, or sources in your results. You set this one to trigger if the number of events is greater than 0.
Now select the way you want to be notified when this alert triggers. You can optionally choose to have the results included with the alert.
You can watch your alerts from your email. Or from your RSS feed reader
What I really like is how you can trigger scripts that send alerts and events to other applications like monitoring systems using SNMP. The alerts can be displayed on my monitoring console along with a link to launch the original search.
Here's a great example of an alert that restarts an application server when Splunk notices it isn't running. It triggers a script to take an automated action.
With just a few clicks Splunk can take any search and turn it into a proactive alert. You can improve your monitoring across multiple systems and technologies by alerting on all your IT data. Now you can find your problems before someone else does!
Alert Tips and Tricks
Alright. So in this example here, what we're going to do is we're going to take a saved surge. We're going to put it on a schedule. So we're going to create an alert. And the action for that alert is actually going to create a ticket and put it inside of a Remedy system. You're going to do this two ways. The first is to be just to pass the raw search results to Remedy in order to generate the ticket. Next, we're going to get a little more granular and we're going to, actually specify that we just want the names of the hosts which generated this alert be included in the ticket.
So first thing we'll do here is just pull up my saved search. I'm looking for failures to open server connection.
I can see that I have a few of them here. And I want to create, now, an alert out of that. So I'm just going to resave this search. And I'm going to call this My Connection Failure Alert. And I'm going to go ahead and put this on a schedule. So let's say I want to run this every minute.
And if the number of events is greater than zero, because any time this event occurs I want to be alerted on it. I want to trigger a script.
What I'm going to do here is I'm, actually going to generate a Remedy ticket. So I'm going ahead. My generate ticket script is already in my scripts directory. So you can see here in Opts Bunk Bin scripts is where I'm actually putting my alert scripts.
And just to make sure we get the spelling right I'll go ahead and LS that, copy that in here. And this is going to go ahead and trigger my script.
Now, what we're going to see here when this alert actually runs is it's going to be passing a couple of variables off to my Java program that actually generates the Remedy ticket. So when we take a look at the actual script itself, I've got three variables that I'm passing along to this Java program at the time the alert is triggered.
And I made myself a little cheat sheet here just to know what the options were.
So here are the variables that are available and here is what we're actually using. So you can see here that we're actually going to pass on to this ticket for ticket information we're going to pass on the reason the alert was triggered, which just means that it was greater than one event.
We're going to pass on a link so when that ticket is generated there'll be a URL that's going to bring the administrator or the support person back into that view of those Bunk results that were triggering the alert to begin with.
And then I'm going to pass on the search results themselves. What I'll end up getting with number eight is actually a path on this system of that search result. So I'm going to wait a minute here for that alert to run and we should have a result file now. Which is all going to be called and passed over to our remedy creation Java program. Now, while that's running I want to do one other thing.
So we said before that when we run this search, we run this alert, and I can see that my connection failure alert here. Well, what if I decide that I don't need all these raw events as part of my ticket generation? More importantly, there are specific fields that I want, such as the process and the host that were causing this ticket to be generated.
Well to do that, I can simply pipe my surge to my fields operator and using the plus sign I'll suppress the raw results from coming back.
And I'm just going to request these two fields here. Now I'm just getting back a set of fields. I'm going to resave this alert. Connection failure alert. Fields only. And the same thing. I'm going to go ahead and set it back up on a schedule. I'm going to run this every minute as well. And again, when the number of events is greater than zero.
I'm going to take an action and I'm going to generate a Remedy ticket. So now we can check and see if our first script actually ran. I can see there that I have my connection failure alert file, which would also passed as variable number eight to my Java program. And I can see inside of here that I'm getting back full results set.
So I'm seeing all the extra fields, all the additional information about the event is all coming back inside of this file. What we want to see in a minute now is when we made the adjusted surge to only pull back the fields that we were looking for. We're going to see a much neater results set.
It will be much easier for my Java program to then create the pertinent information inside the Remedy ticket.
And you can see that now that file has been created as well, using the name of the alert. And voila! I'm getting back now an organized CSV list of just my host name and just the process that I need to investigate. So alert gets triggered, it goes and checks my environment, oh I'm sorry, checks the variables that it's aware of.
And here's my cheat sheet here so I can pass a number of things off to say if I want to create an S and NP trap next time, generate something in Bogzilla, doesn't have to be Remedy, doesn't have to be a trouble ticket.
But the point is that you can actually use this alerting facility to go out and interface with other system and have as much detail or as little detail from the Bunk result as you'd like in that.
Conclusion
So there you have it. An overview of the alerting interface. Thanks for hanging out, and I hope you give it a download, and if you have any questions, feel free to let us know. We're here to help. Until then, Happy Splunking..