**** UPDATE – 10/31/08 ****
I’ve updated the app to version 1.8.
The only fix in this version is a bug with multiple datacenters.
Version 1.8 should now work for an unlimited number of datacetners.
( Thanks to Stephen for finding and letting me know )
As always feel free to bug me if the app has any problems.
**** UPDATE – 10/10/08 ****
I updated the latest release – 1.7 – to fix a shutdown bug.
Turns out that in prior releases when Splunk was shut down that the VMWare app kept running.
This release not will terminate the VMWare app when splunkd goes away.
If you would like to test or run without splunk you can pass in the arg.
java -jar splunk.jar –standalone
** see instructions below on how to run the above command **
As usual, drop me a line if you have any questions.
Good luck with 1.7
**** UPDATE – 09/16/08 ****
Thanks to more testing i have found and fixed a few critical bugs.
Updated APP version 1.6 >> here <<
- there was a static var preventing the multiple server configs from working. Should be fixed, and multiple servers in the vmware.conf should work.
- Ibm jvm’s should work – ie AIX should now work 😉
- Added new saved searches and a few dashboards ( thanks to raffy 😉
As usual, please let me know if you find any bugs.
I’ll type up some notes on my VMworld experince
**** UPDATE – 09/08/08 ****
Thanks to lots of folks trying it out i have found a critical bug that was preventing much of the data from getting indexed. This latest release 1.5 should have that fix and everyone should see all the wonderful VMWare data in the index.
As usual, bug me if it does not work or you have any questions.
If you have made changes to vmware/local/vmware.conf and not to the file in default you can just untar this version on top of your old one. If you are making changes to the default/vmware.conf file, i’d move that to local/vmware.conf that way when i ship updates it will not blow away your conf changes. We ship only default and not local/vmware.conf.
Thanks again to everyone that helped find bugs!
**** UPDATE – 08/27/08 ****
I have updated the app with a few fixes found in the field.
- hopefully fixed issue on AIX (IBM jvm )
- added output of host/vm name on update messages. It was hard to tell where the messages were coming from
- added more debugging infor on startup to help debug connection issues.
Things that are still under-investigation.
- Pointing at lots of ESX servers and not VC. Seems as though some data is not coming back from ESX.
- Making work with older jvm’s ( currently it seems i require 1.5)
**** Original Post 08/10/08 ****
I’ve wanted to release this a few months ago but the project keeps getting stuck on the back-burner. Finally I’ve cleaned it up and had a few people try it and it seems to work well. I’m sure there are configurations and versions out there that will have issues – please write me back ( my first name at splunk.com ) if it does not work as advertised.
Reading the below makes it sounds more difficult that it really is. Just download, un-zip, change the server url, username and password in the vmware.conf file, restart and go! This really is the first pubic release and i’d love to get more feedback. I’ll more than gladly send you Splunk tee shirt of your choice if you help find bugs or have useful suggestions!
Why you want to give it a try:
This vmware app is a cool way to keep track of what your VC and ESX servers are up to, what instances are running where, when they are under load, when instances move, when they have errors, and much more. Since all the data is indexed in Splunk, it’s easy and quick to search for problems and report on your virtual sprawl.
How it works:
This app will connect a splunk server to any number of Virtual Center and/or ESX servers and grab/index the events, logs, properties, performance data, and anything else I can get my grubby mitts on. It’s easy to hookup and get going, so if you use Virtual Center or ESX than give this app a try. I’ll explain how to install/setup, how to trouble shoot, and what you will see when you get it working. You will need to install splunk or use an existing Splunk server. See the configuration file for settings on how often to pull data. Also near the end of this post i give example searches to explain the data.
After installing you get cool graphs like this one showing CPU Usage by Guest by Time:
Add Inside-out monitoring
Its optional but if you can also put splunk on the guest OS’s as light weight forwarders and you will get a brilliant inside out view where we capture not only what VC/ESX thinks but what the guests are seeing on the inside. My best practice is to put splunk on the guests and capture basic logs as well as OS performance metrics, what apps are running, how much mem/cpu they are taking, etc. You can get the Unix/Linux version here and the windows here. Of course its not required and you get a ton of value out of just with the basic vmware app’s monitoring of VC/ESX.
This app requires a JVM be installed on the same box as the splunk server. I know this is less that optimal. Please bug your local VMWare rep and tell them to make me REST API’s and not SOAP API’s. The VMware API’s are hideously over complicated – Please dear VMware make a simple REST interface.
1) Make sure java is present and set the JAVAHOME environment variable. If not already set you must be set JAVAHOME to the directory that contains the java binary.
2) To test the variable is set correctly, try and run the following on the command line
If it worked it should spit back a bunch of options to pass to the java command. If its not set right you will get some kind of file not found error.
3) Grab the vmware.zip file HERE.
4) Unzip the file – and copy the resultant “vmware” directory to your SPLUNK_HOME/etc/apps/ directory. When done the following directory should exist: SPLUNK_HOME/etc/apps/vmware.
There are a few config settings to make the app work.
5) First you need to let Splunk know where your VC or ESC servers are. Edit the
vmware/default/vmware.conf configuration file to point to your vc or esx servers. If using VC you need not specify all ESX servers under management, splunk will get the list from VC. The config file contains one or more of the following stanza’s ( the unique_name can be anything you like so long as its unique):
For each [vmserver] stanza be sure to set:
Note that the url should be the ipaddr of your server with “/sdk” at the end – for example “url=https://10.1.1.35/sdk”. A good way to test that the url and username/password are correct is test using a web browser. Take the url you have entered above and replace the “sdk” with “mob”. Use the web browser to navigate to that url and make sure it asks for username and password and that the values you entered above will authenticate correctly. If the “mob” url works with the username and passowrd you entered than splunk should have no trouble.
With those three set you should be up and running after a restart.
The rest of the config file should be self explanatory and is included end of this post for reference but you should not need to change anything else.
Testing and Troubleshooting:
6) It’s best to test running the vmware app outside of splunk first.
You’ll need to make sure that SPLUNK_HOME is set for the test.
** On Windows **:
set SPLUNK_HOME=your splunk directory
#note it does not like it when i add quotes around this path – try with no quotes.
Then run the app by hand
> cd %SPLUNK_HOME%\etc\apps\vmware
> java -jar lib/splunk.jar
** On others ** :
export SPLUNK_HOME=your splunk directory
Then run the app by hand:
> cd $SPLUNK_HOME/etc/apps/vmware
> java -jar lib/splunk.jar
It should spit out all sorts of vmware data. If it throws an error its likely that SPLUNK_HOME or JAVAHOME are NOT set. Remember SPLUNK_HOME will be set by the server when the server runs the script. You need only set it for testing.
If it does not work, likely the exception will have something useful in it such as connection refused ( bad auth ) or a 404 error in which case the url is incorrect.
If you get any non-obvious errors email me ( my first name at splunk.com ).
7) Try running in splunk.
If the above test works than you should be able to just restart splunk and all should be good. The way to tell if its working is that you will get events with sourcetype vmware and vmware_api.
8 ) If you do NOT see events of type vmware_api on the dashboard than try the following search:
You should see some kind of error or warning that is hopefully obvious. If not again email me and i’ll sort you out.
Using the App
At this point it should be working and you should be able to search for cool stuff.
Here is a quick overview of what splunk is indexing:
After restarting you should see a bunch of logs from vwmare and at least two new sourcetypes ; vmware and vmware_api. Below is a screen shot of my dashboard after restarting – notice the vmware logs and the vmware_api event counts.
The vmware sourcetype is for the actual vwmware logs while the vmware_api sourcetype is for the API calls. It can take a minute before they show up so if they are not there, try again after a minute. If you still do not have the logs that likely means the logs path in the vmware.conf if incorrect and you should make sure the path is correct or contact me.
If you do not see the API calls than there is likely an auth or url error that should have been caught when you did the manual test above. Try retesting by hand above – if the by-hand method works but not through splunk than contact me.
I’ve just started to explore the logs that come back – there is a ton of information in them but my test infrastructure is not all that insteresting so i’m not sure what goodness you all might find in them. Poke around the files and see what you see and bug me if you see anything interesting i can make them into alerts / reports.
The meat of the data is from the API where we pull everything we can.
Most useful are:
Every few seconds we captures the metrics for all VM’s, including
I’m not sure the scope of these but it looks like interesting events kicked out by ESX. Someone with a larger VMware installation might find far more interesting events than i see on our infrastructure.
It looks like when anything changes, we can an update.
I periodically just capture the inventory tree. It’s more for debugging than perhaps useful in a production environment but it does not cost much to get and it can be useful.
Thanks to Christina we do ship with a bunch of saved searches. After installing you should see them, they all start with ‘VM:’. They are named to be somewhat obvious, again let me know if they dont work or you have some better ones to add to the default app. Try some of the Metrics and Status saved searches to make sure your install is working.
- VM: Investigation CPU load on all guests sharing ESX server
- VM: Investigation Find ESX Host for Guest
- VM: Investigation Find Guests sharing ESX Server – Non FQDN
- VM: Investigation- Find other VMs sharing ESX Host
- VM: Investigation- Processes on hosts sharing ESX Server
- VM: Investigation- Running processes on other guests on same ESX server
- VM: Metrics- CPU by Guest last 60 minutes * VM: Metrics- Host Memory Usage last 15 minutes
- VM: Metrics- Host Memory Usage last 60 minutes
- VM: Metrics- Memory by Guest last 60 minutes
- VM: Status- Free Space by Datastore
- VM: Status- Running Guests
- VM: Status- Running VMs
That’s about it.
Like i said, PLEASE email me if you have bugs or suggestions.
I’ll plan on updating the app with whatever feedback i get from folks. So please, help me out and get yourself a tee shirt.
P.S. – there is a sample of the config just so that you can see what’s in it without downloading:
The following are the important values in the config file:
url=https://10.2.1.151/sdk ## This is the url to the vc or esx server
username=your_username ## user name to auth against the server. If you are not sure of its value point we browser at the above url and check the web auth, it will be the same.
password=your_passowrd ## we will support non-clear text in the near future.
ignorecert = t ## for now leave as true (t), we will soon support checking of certs
loggingLevel = error ## to turn on debugging values are [error, warn, info, debug ]
index_events = t ## should we index events (t)rue or (f)alse
events_interval = 10 ## how often to check for events in seconds
index_properties = t ## should we index events (t)rue or (f)alse
property_interval = 10 ## how often to check for events in seconds
index_metrics = t ## should we index events (t)rue or (f)alse
metrics_interval = 10 ## how often to check for events in seconds
index_updates = t ## should we index events (t)rue or (f)alse
updates_interval = 10 ## how often to check for updates in seconds
index_logs = t ## should we index logs (t)rue or (f)alse
logs_interval = 300 ## how often to get log changes…
logs_localpath = ../var/spool/vmware ## the logs are copied from vc/esx to the this directory where splunk will pick them up for indexing