Community:Monitoring JVMs
From Splunk Wiki
HOWTO: Index your JVM garbage collection data
Most of us have been in situations where we have to debug performance problems in Java based applications based on little more information than thread dumps and the garbage collection logs. With the many tunable parameters that the newer JVMs offer, its easy to get the garbage collection configuration wrong for your environment. Fortunately, the latest JVMs offer ergonomics, where less is more - you specify a few parameters for tuning, and let the JVM figure out the rest.
If you invoke the Sun JVM with the -Xloggc:logfile parameter or the IBM JVM with the -Xverbosegclog parameter, the Garbage Collector (GC) will faithfully write out what its doing to the named logfile. Each line in the log file corresponds to a GC operation. There are two kinds of lines, one for partial GC and one for full GC.
(the text above was written by Sujit Pal)
Changing your JVM startup options
You need to add the following arguments to your JVM's startup parameters to have the data be generated with timestamps. Change the logging path and filename to suit your needs.
Sun JVM:
-Xloggc:C:\MyJVM\jvm.log -verbose:gc -XX:+PrintGCDateStamps
IBM JVM:
-Xverbosegclog:C:\MyJVM\jvm.log
Configuring Splunk
Make sure you create an file input to capture the JVM log file mentioned above. You also need to set its sourcetype to "sun_jvm" or "ibm_jvm".
You then need to change your props.conf and transforms.conf to contain the following lines:
Sun JVM
props.conf
[sun_jvm]
AUTO_LINEMERGE=FALSE
SHOULD_LINEMERGE=TRUE
DATETIME_CONFIG=CURRENT
BREAK_ONLY_BEFORE=\d+\.\d+:
REPORT-jvm = sun_jvm_gc
transforms.conf
[sun_jvm_gc]
REGEX = \[(Full\s)?GC\s(?<JVM_HeapUsedBeforeGC>\d+)K->(?<JVM_HeapUsedAfterGC>\d+)K\((?<JVM_HeapSize>\d+)K\),\s(?<JVM_GCTimeTaken>\d+.\d+)\ssecs\]
IBM JVM
props.conf
[ibm_jvm]
SHOULD_LINEMERGE=TRUE
BREAK_ONLY_BEFORE=<af\s
NOTE: You may need to adjust the regular expression if your data looks a bit different.
You then need to either restart Splunk, or just have it reload the new config by running this search: * | head 1 | kv reload=t
Visualizing the data
Now that Splunk is correctly indexing your JVM logs you can create dashboards that show your GC details.
The following dashboard is created using these 2 searches:
sourcetype=sun_jvm | timechart avg(JVM_GCTimeTaken)
sourcetype=sun_jvm | timechart avg(JVM_HeapSize) avg(JVM_HeapUsedAfterGC) avg(JVM_HeapUsedBeforeGC)
If you're using IBM's JVM, try:
sourcetype=ibm_jvm | timechart avg(totalms)
Handling large data volumes
If your JVM is under heavy load or is just chatty, you could end up with a large log file which in turn could lead to slower searches for your dashboard. In that case its recommended you use summary indexing.
Food for thought
Now that you're indexing your JVM data you can also leverage all the other cool stuff Splunk provides. You could create an alert that emails you when a JVM is misbehaving or even automate the way you deal with the JVM by doing things like automatic restarts, etc. The uses are endless! :)
Feedback?
I'd love to hear from you. Email me at simon at splunk dot com.

