Peoples of the Interweb,
As one of the Splunk Support Monkeys I am going to try to start a semi-regular series of posts on a topic that is near and dear to me — getting the Splunk community to be able to troubleshoot their issues without the need to reach out to the Support Team.
The most important piece of any troubleshooting exercise is getting a solid understanding of the problem. The common statement “Shit is broke” while ‘summarizing’ the problem doesn’t do much in the way of isolating the specific problem. Taking a minute or two to think about the problem at and documenting the sequence of events leading up to the problem goes a long way to getting outsiders up to speed on the issue.
Here are few things to keep in mind when working with support:
I don’t work in the next cube over.
This means I don’t have insight into all of the other moving parts of your network. Try avoiding acronyms that are specific to your organization. I don’t know the naming convention that you use for machine names, so if one box is in LA and the other is New York tell me, don’t expect me to know that foo.company.com is sitting in the LA data center.
Less is not more.
You can never give a support engineer to much data. Often times folks think that they have identified the offending error message in the logs and provide that one line in their support ticket. The problem with this is that the support engineer does not get the benefit of context. Most errors are the result of a series of events leading up the final failure. Being able to see what was going on leading up to the problem often times is what allows us to identify cause. The basic rule of thumb is if you think it would be at all useful share. If I can channel Don Rumsfeld for moment: It easy to know what you know, it is hard to know what you don’t know.
Reduce the problem to the fewest number of variables possible.
Remember your 7th grade Algebra class and those complex equations that Mr Buckner had you had solve? You started off solving for x and then you went back using your knowledge of x to determine the value of y. The same is true when troubleshooting software. When you try to solve 4 problems at once you end up polluting your results; you can’t tell if the change you made for x resulted in y blowing up. By breaking the problem into smaller chunks you are operating in a more scientific manner and the results have more credibility.
Log like there is no tomorrow.
Debug logs are your friend. In normal operations the logs don’t need to be verbose but when you are trying to figure something out why not give yourself the benefit of the secret messages that the developer put in the code for precisely this reason. It is also helpful to push the existing log file out of the way when starting in a debug mode. While I said early that you can never give a support engineer to much information the majority of the stuff in your logs (especially if you’ve been running for awhile) is going to be white nows. Starting in debug mode with a fresh log means that the problem and the only the problem are going to be in the log.