
Microsoft Exchange is one of the most ubiquitous, mission critical services deployed on-premise today. According to Gartner analysts, there are over 300 million users and this number continues to grow steadily. However, Exchange can also prove to be incredibly challenging to keep reliable and secure.
Exchange administers have long been the scapegoats of unreliable messaging. How many times have we cursed our admins for losing a message, dealing with unwanted spam, or not being able to send or receive messages? Or better yet, what if your entire organization is infected with a virus, like Slammer or Lovebug? Generally speaking, if there’s a problem with messaging or calendars, it’s the Exchange admin’s head.
The pain point was heard loud and clear by Microsoft and its various system management partners. In early 2002/3 when Microsoft Operations Manager began developing management packs, the Microsoft Exchange team was the first to jump on board. Too long they had heard the cries from their primary stakeholders. However, while the resulting Exchange management pack went a long way towards solving the basic monitoring requirements of Exchange server, there were still many holes in trouble-shooting and reporting in the broader messaging environment that Microsoft couldn’t solve on its own.
Microsoft’s system management partners developed script-based products that could perform message tracking for some forms on trouble-shooting, but these tools tend to be relatively difficult to deploy and don’t always integrate well with existing monitoring solutions (ie: Exchange management pack). Actually, it seems the market has been bifurcated into good monitoring tools and decent trouble-shooting tools. However, there isn’t really one that can do both well.
While this market analysis is interesting, did our customers care? Is this problem that Splunk could potentially solve? Splunk’s installed base answered with a resounding “yes”! Over ninety percent of respondents in last Fall’s installed base survey asked for better operational intelligence from a Splunk app for Microsoft Exchange.
We interviewed numerous respondents to learn more about their specific requirements. There were all sorts of examples and suggestions that made a lot of sense. Here’s a list of some of the suggestions:
1) Message tracking for trouble-shooting and forensics;
2) Drill-down dashboards for multiple levels of monitoring;
3) Log collection and aggregation for migration testing;
4) Correlate service metrics with server/infrastructure health and availability; and
5) Log correlation of various subsystems, such as Ironport and Exchange.
While the general requirements were relatively straight forward, it was even clearer that Splunk could really help Exchange administrators – as well as security analysts and the service desk. Splunk’s inherent capabilities to collect and aggregate any machine generated data and to derive intelligence through correlation and analysis make it possible to support all of the requirements above.
Several pioneering companies, such as KKR, began using Splunk for trouble-shooting and forensics based on Splunk’s inherent capabilities. The primary value they found from Splunk was reduced time-to-resolution of a trouble ticket. They no longer needed to comb through gigabytes of logs to figure out why a particular person’s email box was missing messages. They were able to put a Splunk forwarder on the Exchange server associated with the particular person’s mailbox and collect logs straight from it – in a matter of minutes. Since that experience they have been using Splunk for trouble-shooting via message tracking for their entire company across all geographies.
Although KKR’s use case is incredibly valuable and interesting to Splunk’s overall community, there are many other potential use cases that Splunk can enable with some of its newer functionality, such as real-time monitoring and alerting. Now Splunk can not only support deep root cause analysis, it can do so in real-time.
The challenge for Splunk’s app development team was to figure out how to apply all of this analytical and monitoring power to Microsoft Exchange as a service running in a highly distributed, complex environment. There are at least ten different data sources that constitute the base platform, so developing field extractions that make sense and correlate data meaningful for the average Exchange administrator became the key design challenge. Once the app team derived the general searches, field extractions and resulting dashboards, it became a question of UI organization in the form of trouble-shooting workflows and query forms.
It is important to note, however, that the actual time to build the entire app that has over 30 dashboards and covers all of the above scenarios was less than six months – including the beta program.
Not only was this complex app quick to build, test and support, but it is also very flexible – due to Splunk’s ability to bring in new data sources so quickly. For instance, a beta customer suggested that Splunk collect and visualize Microsoft ForeFront data to augment its operations dashboards. Splunk’s app team was able to turn this request around in a matter of weeks.
The Splunk App for Microsoft Exchange supports Microsoft Exchange 2007 and Microsoft Exchange 2010. You can download it at http://www.splunk.com/goto/exchange. It will be generally available on August 15th on SplunkBase and can be downloaded for free with an Enterprise license. You will need the Windows version of Splunk, which you can find on www.splunk.com/download. If you have questions/comments, reply to this blog or email msexchange@splunk.com.