At .conf19 a couple weeks ago, we announced the General Availability of Splunk Cloud 8.0. In addition to highlighting the new capabilities of the 8.0 release, I also wanted to summarize some of the customer feedback I have received in this blog. Specifically, how Splunk Cloud administrators plan to take advantage of the new 8.0 features to make their lives easier managing their Splunk Cloud environment at scale.
Splunk Cloud 8.0 brings a host of new features including:
- Workload management for Splunk Cloud: Workload management allows customer admins to prioritize resources by users, roles or Apps. Splunk Cloud provides three pre-configured search pools. Customer admins can assign some workloads in a higher resource pool to provide higher performance or in a limited resource pool to effectively quarantine workloads.
- Security enhancements: Granular access controls plus a new user interface for roles management.
- Python 3.7 support: Customers will be able to migrate scripts to Python 3.7 compatibility individually over time or force Python 3.7 if compatibility is crucial.
- Distributed search: Get up-to-date search results with faster cascading bundle replication.
- Search and metrics performance improvements: Gains in search performance via grouping of alerts Cost savings with optimized metrics data storage plus wildcard functionality for logs2metrics.
While there are many worthy features in Splunk Cloud 8.0, I’ve been repeatedly told one particular feature is a game changer for Splunk Cloud admins – Workload Management (WLM) – and why I’ll focus on it here.
Currently, one or more of the following scenarios are common but difficult for Splunk Cloud administrator to predict and police:
- Data ingestion lags
- Urgent and business impacting searches queuing up behind lower priority searches.
- New users are on-boarded and start running searches that do not follow team best practices - for example, running a search across all available indexes or against all time.
WLM addresses these scenarios by allocating CPU and memory resources in a logical container of resource groups called workload pools. This results in:
- Ingest workload isolation - even when your Splunk Cloud is busy, ingestion continues as a prioritized task.
- Rule-based CPU and memory resource partitioning - by configuring rules to partitioning search workloads based on apps and roles you can ensure guardrails against search over-usage.
- Workload rules to ensure that high-priority searches are placed in a workload pool that has sufficient resources, while less important searches are appropriately isolated in a different workload pool.
The Splunk Cloud team have worked hard to ensure WLM is a fully self-service feature that is easy to configure and use. Even without performing any WLM configuration in Splunk Cloud 8.0, WLM is already working in the background—the ingest workload isolation rule is enabled by default to protect against data ingestion lag
When you are ready to create your workload rules, we’ve provided a simple framework for you to operate within. First, Splunk Cloud comes with pre-configured workload pools for you to utilize—this takes the guesswork out of WLM and allow you to quickly utilize the feature. There are three workload pools that take effect when there is search contention:
- standard_pool - by default, all searches are placed in this pool and they receive 35% of CPU resource allocation
- high_perf - when searches are placed in this pool, they can take up to 60% of the CPU resource allocation
- limited_perf - this pool has 5% CPU allocation so any search placed in this pool will execute with minimal resources
Next, you create workload rules to assign specific searches against the pre-configured pools. You name of the rule, specify the search criteria and the corresponding action. For example, if your use case is to protect against users running a search across all available indexes or against all time, you can set the action to send these types of searches to the limited_perf pool. This ensures the search continues to run but consumes a much smaller percentage of the available Splunk Cloud resources. And that’s it—you’ve just created your first WLM rule.
WLM is built for scale so you can have up to 100 workload rules for your Splunk Cloud. Some customers may already have a backlog of scenarios that they are prepared to create rules to mitigate against immediately and others are still mulling how best to utilize WLM. No matter which end of the WLM adoption spectrum you reside on, every customer whom I have spoken to have consistently indicated they will start with a primary scenario, create the appropriate workload rule and measure results. There may be multiple iterations in the first use case to fine tune the parameters. Once they are comfortable that the primary use case, then they can proceed to the next one.This iterative approach to adopting WLM is definitely a best practice for all customers since you can ensure predictable feature behavior and outcomes that meets the needs of your teams. You can learn more about WLM on the Splunk Cloud documentation site.
----------------------------------------------------
Thanks!
Azmir Mohamed