Get in Command of Splunk Resources with Workload Management - Part 1

Splunk administrators are increasingly facing the unique challenge of providing Splunk service to several users and applications, and at the same time balancing the quality of that service in line with overall business needs. Most times, they have a shared Splunk infrastructure, which makes it complicated to allocate Splunk resources based on workload priority and ensure proper resource isolation to prevent noisy neighbor problems.

In this series of three Splunk blogs, I will share how Splunk Workload Management may be used to solve these challenges. In the first installment below, I will describe how to configure the feature and use it to reserve resources for ingestion and search workloads.

Splunk Workload Management was first released in Splunk Enterprise 7.2 and later improved in Splunk Enterprise 7.3. This blog includes capabilities that are generally available to Splunk Enterprise customers.

How to Get Started with Workload Management

Workload management is a rule-based framework to allocate system resources (compute and memory) to various workloads, and manage workloads through their lifecycle. For resource allocation, workload management uses cgroups functionality in the Linux operating system. Hence, you need to setup cgroup hierarchy on all servers (search heads and indexers) before using workload management for resource allocation. There are various ways to do so depending on whether or not you are using systemd.

The overall system resources that are allocated to Splunk are divided in three predefined categories — Ingest, Search and Miscellaneous. For each category, you may allocate resources and create resource pools. For Ingest and Miscellaneous categories you can only create a default pool. For Search category, you may create several pools and assign resources from Search category. If CPU resources allocated to a pool are not fully utilized, they are automatically shared with workloads running in other pools. However, the memory allocation is the maximum memory that can be used by workloads in a pool.

All key Splunk processes run in the Ingest pool, hence, protection against OOM may be provided by setting appropriate memory limits for each pool. In the example below, even in the worst case scenario when several searches are running, the ingest pool gets a minimum of 30% memory because Search and Miscellaneous pools have a combined memory limit of 70% (65%+5%). This provides memory protection in heavily loaded deployments. Modular and scripted inputs run in the Miscellaneous pool.

For search heads, the resource settings can be defined using the graphical user interface. Once defined, the settings are saved in workload_pools.conf file. For indexer clusters, this file needs to be pushed through Cluster Master. In most scenarios, workload_pools.conf should be the same across search head and indexers but it is not a requirement. If you are using multiple standalone search heads or search head clusters with the same indexer cluster, you may need a different configuration file for each search head (cluster) and indexer cluster. More on this in the second blog!

Note: This is just an example. The allocation will depend on your specific workload and deployment.

Once the workload pools are defined, you may create workload rules to place searches in different search pools. The rules are a logical combination of predicates such as app, user, role and index (e.g. If index=security AND role=security_team, then Place Search in Priority pool). The rules are evaluated in order and once a rule is matched, the corresponding action is taken. Rules listed below the matched rule are not evaluated and hence the ordering of rules is very important. For example, in the scenario below, Rule #2 will be never matched. Generally, more restrictive rules should be ordered on the top.

index=security then Place Search in Standard pool
index=security AND role=security_team then Place Search in Priority pool

Resource Allocation for Ingestion and Search Workloads

Oftentimes you want to ensure that heavy search workload does not result in data ingestion lag or drop, and vice versa search execution is not impacted because of heavy ingestion load. This can be achieved by allocating CPU and Memory resources across ingest and search categories. The example below illustrates the CPU utilization by ingest and search workloads at four periods of time when workload management is enabled.

Period 1: Only a single search is running in Default_Search pool. As there is no CPU contention, it gets as much CPU as required (34%) although the CPU allocation was 14%.
Period 2: Two searches are running, one each in Default_Search and HighPriority_Search pool. Again, there is no CPU contention so both searches get as much CPU as required (33%) although the CPU allocation was much lower.
Period 3: No search is running but a large ingest workload (blue bars) has started. As can be seen at the onset of ingest and also at the tail end when no search is running, the ingest workload gets as much CPU as required, up to 80-90%, although the allocation is 20%.
Period 4: Multiple searches and ingestion workload are running concurrently. There is CPU resource contention and hence workload resource reservation kicks in. The searches running in HighPriority_Search and Standard_Search pools get the allocated CPU resources (~27%) while the ingestion workload gets the remaining CPU cycles.

_{CPU Utilization Monitoring from Splunk Monitoring Console}

Workload management enables you to assign Splunk resources to meet your business and operational requirements. You may use a set of rules to automatically place searches in different pools and also provide granular access controls to certain users to choose their own workload pools. In addition, rich monitoring capabilities allow you to track utilization and fine tune the resource allocations.

Overall, Workload Management puts you in control to allocate Splunk resources more precisely to service your internal business requirements without over provisioning the capacity. In the next blog, I will describe how to configure workload pools in complex deployment scenarios, and how to tame use cases like internal chargeback and high priority search execution. In the meantime, check out the demo video to get started with Splunk Workload Management.

Style

two-column

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Platform

2 Minute Read

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Splunk POD is designed to simplify your on-premises data analytics, so you can focus on what really matters: making smarter, faster decisions that drive your business forward.

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Platform

3 Minute Read

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Announcing the general availability of the new workload dashboard – a modern and intuitive dashboard experience in the Cloud Monitoring Console app.

Platform

5 Minute Read

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

The heart of our momentum at Cisco Live APJ is our deeper integration with Cisco, culminating in the Splunk POD and new integrations, delivering unified, next-generation data operations for every organization.

Dashboard Studio: Token Eval and Conditional Panel Visibility

Platform

4 Minute Read

Dashboard Studio: Token Eval and Conditional Panel Visibility

Dashboard Studio in Splunk Cloud Platform can address more complex use cases with conditional panel visibility, token eval, and custom visualizations support.

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Platform

4 Minute Read

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Introducing Resource Metrics in Workload Dashboard (WLD) – a modern and intuitive monitoring experience in the Cloud Monitoring Console (CMC) app.

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

Platform

3 Minute Read

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

The Cisco Data Fabric brings AI-centric advancements to the Splunk Platform, seamlessly connecting knowledge, business, and machine data.

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Platform

3 Minute Read

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Simplify fleet-wide upgrades of Windows Universal Forwarders with Splunk Remote Upgrader—centralized, signed, secure updates with rollback, config preservation, and audit logs.

Platform

3 Minute Read

Dashboard Studio: Spec-TAB-ular Updates

Splunk Cloud Platform 10.0.2503 includes a number of enhancements related to tabbed dashboards, trellis for more charts, and more!

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Platform

2 Minute Read

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Announcing the introduction of Edge Processor for Splunk Enterprise 10.0, designed to help customers achieve greater efficiencies in data transformation and improved visibility into data in motion.

/en_us/blog/fragments/about-splunk

/en_us/blog/fragments/subscribe-footer

Get in Command of Splunk Resources with Workload Management - Part 1

How to Get Started with Workload Management

Resource Allocation for Ingestion and Search Workloads

Related Articles