The Insider's Guide to Splunk Enterprise Upgrades: Before, During, and After

Splunk technical smokejumper David Paper is fired up about Splunk Enterprise upgrades.

In fact, he just published a three-part series on Splunk Answers detailing best practices for what to check before, during, and after an upgrade. David's posts are the technical detail that follow from the Splunk Answers post, "What is the order of operations for upgrading Splunk Enterprise?".

I caught up with David last week, fresh from another successful upgrade to Splunk Enterprise 7.3.1 at a large customer site where he road-tested his upgrade best practices. I asked him how it went, and how other customers can use his upgrade best practices to plan their own upgrades.

Ripe for Success

"This customer had been struggling with upgrades," David said. Before the Splunk teams got involved, the customer was on many different versions and had implemented piecemeal updates at different times. They standardized on Splunk Enterprise 6.6.3, then upgraded to 7.2.3 in February 2019. They just upgraded to 7.3.1 in August 2019 to position them to leverage data model acceleration (DMA) support for SmartStore in the future. SmartStore is the new Splunk Enterprise feature that decouples compute from storage to improve elasticity and save on storage costs.

Their main deployment has a six-node search head cluster and 126 indexers clustered across three sites. "We planned for a nine-hour maintenance window, from 6pm to 3am," David said. "We were done by 10:30pm!" The environment took an hour to fully recover, and they were turning off the lights at 11:30pm. They're so well organized now that maintaining upgrade levels will be easy and routine going forward.

Operational Best Practices Helped Get Their House in Order

David explained how his upgrade best practices fit into a larger framework of operational best practices to make this upgrade dream a reality.

David also urges people to schedule a roomy upgrade window. "Give yourself enough time to make a mistake and enough runway to do extra validation," he says.

Benchmark and Check System Health Before the Upgrade

David's next advice: benchmark your system and thoroughly check system health before you start the upgrade. In his Splunk Answers post, "How do I benchmark system health before a Splunk Enterprise upgrade?", David published a detailed list of what to check to ensure your system is healthy enough to proceed with the upgrade. He also gives advice about how to benchmark KPIs, which you can use to validate system health and performance after the upgrade.

"The monitoring console provides a good baseline of what things look like from the various points of view," David said. His post includes optimal performance ranges, and suggestions of when you should fix something before you upgrade. "Customers can use our guideline and fill in the bellwethers of their own environment."

Pause and Allow Components to Recover During the Upgrade

David's strongest advice is to pause the upgrade and allow components to recover before starting the next phase. His Splunk Answers post, "How do I monitor system health during a Splunk Enterprise upgrade?", gives details about how to reduce recovery time and keep the system searchable throughout the entire upgrade by taking this phased approach.

"We put the cluster master in maintenance mode, upgraded the first site, then once the nodes came back online, we took the cluster out of maintenance mode and gave it time to get the three green checkmarks: the cluster is searchable, and search factor and replication factor are met," David explained. "Then we rinsed and repeated for the other two sites."

Instead of having 15 million buckets to update at the end, the cluster master only had 5 million and it chewed through the recovery in only an hour. The cluster master was able to reassign primaries and keep the indexes searchable while the upgrade was in progress. The nodes were busy during the upgrade (but not overwhelmed) and remained online.

Validate Key Health Indicators After the Upgrade

The post-upgrade validation is where all this planning and methodical execution pay off. David's Splunk Answers post, "What do I validate after I upgrade Splunk Enterprise to confirm the upgrade was successful?", provides a detailed checklist of what to verify and validate after the upgrade to ensure that all your components are healed and performing at or better than pre-upgrade benchmarks.

"In previous major code upgrades, we've done things like improve search performance, improve performance of DMAs, and lower skipped searches in the same time period," David says, although specific results depend on the upgrade version and the situation. Using his benchmark-before-validate-after methodology sees the performance metrics quickly return to the mean compared to benchmarks taken before the upgrade.

Easy to Stay Current on the Latest Release

After always being a year behind on Splunk Enterprise versions, David's customer is now fully current and poised to make regular Splunk Enterprise upgrades part of their ongoing operations so they can adopt new features and stay ahead of the end-of-support curve.

Learn what's new in the latest Splunk Enterprise release and stay aware of the version support status for your release.

Upgrade Best Practices on Splunk Answers

David published the before, during, and after upgrade best practices on Splunk Answers and tagged them with the validated_best-practice tag and the upgrade tag as part of the program How Crowdsourcing is Shaping the Future of Splunk Best Practices. Follow these tags to get notifications about updates to these posts, and join in the conversation! How will you use these best practices for your Splunk Enterprise upgrade?

----------------------------------------------------
Thanks!
Jane Mulcaster

Related Articles

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease
Platform
2 Minute Read

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Splunk POD is designed to simplify your on-premises data analytics, so you can focus on what really matters: making smarter, faster decisions that drive your business forward.
Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights
Platform
3 Minute Read

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Announcing the general availability of the new workload dashboard – a modern and intuitive dashboard experience in the Cloud Monitoring Console app.
Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ
Platform
5 Minute Read

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

The heart of our momentum at Cisco Live APJ is our deeper integration with Cisco, culminating in the Splunk POD and new integrations, delivering unified, next-generation data operations for every organization.
Dashboard Studio: Token Eval and Conditional Panel Visibility
Platform
4 Minute Read

Dashboard Studio: Token Eval and Conditional Panel Visibility

Dashboard Studio in Splunk Cloud Platform can address more complex use cases with conditional panel visibility, token eval, and custom visualizations support.
Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard
Platform
4 Minute Read

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Introducing Resource Metrics in Workload Dashboard (WLD) – a modern and intuitive monitoring experience in the Cloud Monitoring Console (CMC) app.
Powering AI Innovation with Splunk: Meet the Cisco Data Fabric
Platform
3 Minute Read

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

The Cisco Data Fabric brings AI-centric advancements to the Splunk Platform, seamlessly connecting knowledge, business, and machine data.
Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades
Platform
3 Minute Read

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Simplify fleet-wide upgrades of Windows Universal Forwarders with Splunk Remote Upgrader—centralized, signed, secure updates with rollback, config preservation, and audit logs.
Dashboard Studio: Spec-TAB-ular Updates
Platform
3 Minute Read

Dashboard Studio: Spec-TAB-ular Updates

Splunk Cloud Platform 10.0.2503 includes a number of enhancements related to tabbed dashboards, trellis for more charts, and more!
Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises
Platform
2 Minute Read

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Announcing the introduction of Edge Processor for Splunk Enterprise 10.0, designed to help customers achieve greater efficiencies in data transformation and improved visibility into data in motion.