Splunk for Change Management

The old way: Unauthorized change causes downtime.

Ask any IT professional about root cause of critical service problems and you're likely to hear the same word in every answer: change. Unauthorized change. Until now, IT management has tried to combat unauthorized change through a combination of change control approaches — CMDB, configuration management and provisioning tools, etc.; and change auditing approaches — server and network change detection and reporting. The change control approaches have been incompletely applied, while the change auditing approaches have resulted in expensive new information silos that are divorced from incident and problem management processes. Unauthorized changes are still going undetected, and it's exceedingly difficult to track down the changes behind new problems.

The new way: IT Search pinpoints unauthorized change.

Splunk for Change Management, a new application built on the Splunk IT Search platform, applies powerful indexing, search, alerting and reporting capabilities to the challenges of change management.

Splunk captures and indexes filesystem changes, database audit logs and actual configuration files and database records alongside configuration policy, change tickets, error events and other IT data for contextualized view of change. Its powerful search capabilities let you correlate data from different servers, compare actual configuration to policy, and alert and report on anomalies and unauthorized changes. Better yet, Splunk lets you navigate from errors to changes and configurations within a single search interface to speed root cause identifications and resolve problems fast.

Benefits

  • Drives efficiency through all change management processes: audit, detection, reporting and validation
  • Breaks down change control and audit silos by bringing all information into one place
  • Speeds MTTR by quickly pinpointing service-impacting changes during incident response
  • Avoids downtime by detecting changes prior to negative system impact

Use Splunk for:

Change Auditing
Splunk makes the change audit process an effortless daily routine. Use Splunk’s predefined change auditing searches to find and review all configuration file changes, deletions and additions. Pre-defined searches leverage Splunk’s sophisticated transaction correlation capabilities to retrieve authorized changes made as scheduled, outside the authorized time window, and those that lack corresponding change tickets. Splunk’s easy to navigate results make reviewing these changes quick work. And an audit trail lets you prove that you’re complying with review requirements.

Change Detection
No matter what controls are in place, real-world systems continually drift from their target configurations. Splunk detects when files on some hosts differ from others, and when files on production hosts differ from master configurations in CMDB or change control systems -- before they cause downtime or performance problems. Splunk alerts you to these variances via email or RSS. Splunk can even automatically remediate change variances by triggering scripts.

Change Reporting
Keeping tabs on changes throughout the day is the best way to be sure you know what’s happening in your environment. Splunk provides dashboards and reports to look at the volume of changes in a variety of different dimensions at-a-glance. Keep tabs on changes by host, by host group, by file and trended over time. Monitor the volume of authorized and unauthorized changes. And if something sticks out, quickly drill down to individual change events, specific configurations, and other activity that are impacted by the change.

Change Validation
Authorized changes that don’t take place, or don’t have their intended impact, must be tracked as well. Splunk makes it easy to close the loop on every authorized change by configuring searches that routinely validate changes and their intended impact. This search can be linked from each change ticket, and can be built into the standard change management workflow. For example, for a change intended to cure an intermittent error on an application server, a search for the error message can validate whether the change was effective.

Incident Response
Best of all, Splunk lets you link change to its impact on system behavior and performance. When an error occurs, Splunk quickly locates the symptom in error logs, and then correlates on time with underlying changes, configurations and administrative events -- all from a single web interface. Instantly identify the latest configuration of every component involved in a failed transaction. Find out what changed last and who changed it. Find the reference configuration and quickly highlight the specific variances. There’s no need to switch contexts to a dedicated change management console.

Talk to an Availability Expert

Eric Garner Expertise: J2EE monitoring and troubleshooting and messaging infrastructures

Harper Mann Expertise: Network management, change management and virtualization

Robert Ide Expertise: Managing large-scale virtualization and grid computing environments

Mark Bagley Expertise: ITIL, IT Governance, SOX and large-scale Splunk deployments

Vi Ly Expertise: Monitoring and troubleshooting large-scale LAMP and J2EE environments on Linux and Windows

Jeff Blake Expertise: Managing high availability databases infrastructures

Dan Goldburt Expertise: Monitoring and service level management for mission critical applications and infrastructure

Steve Hudson Expertise: Troubleshooting failures and monitoring large-scale, N-Tier J2EE and .NET applications

Michael Wilde Expertise: Monitoring and troubleshooting Windows applications and infrastructures

close

Flash required to play this video.

Click here to download the free Flash Player.

Description:

Permalink: