Monitoring Bucket Health in Splunk Enterprise

Why is understanding small buckets important? Bucket health is important to monitor because it can adversely impact Splunk search performance. Unhealthy bucket growth — especially the asymmetric creation of small vs. large sized buckets — can lead to slower or paused searches by requiring each search to read more (TSIDX) files and perform more disk I/O. This leads to slower or paused searches, and at its worst can cause search and indexing services to become unavailable to users. With limited resources left available, indexing queues can become blocked or full, resulting in data latency that impacts alerting and other time critical searches.

How Do Buckets Work?

Splunk Enterprise stores indexed data in buckets, which are directories containing both the data and index files into the data. An index typically consists of many buckets, organized by age of the data. To learn more about buckets, read the Splunk Docs here. New bucket creation is a normal part of Splunk internal operations — as the volume of indexed data grows, so do buckets. New buckets can also be created from routine system tasks such as indexer cluster restarts, instance shutdowns and last recently used cache eviction.

Small buckets, or buckets that were rolled prematurely before reaching their maximum configured size, directly impact search performance. The more buckets a search needs to read, the more resources a search requires to complete. Thus, a telltale sign of unhealthy bucket growth is the presence of small-sized buckets.

What Causes Unhealthy Buckets?

In most cases, the presence of very small buckets are indicative of data issues, particularly timestamp mismatches. When the events coming into an index are outside of a allowed time span for a bucket, Splunk Enterprise will create a new bucket. For example, the following situations can lead to buckets rolling prematurely:

Sourcetypes mixing different data sources from the past, and were generated in different timeframes
Systems that have incorrect time settings on the source, or applications logging incorrect timestamps
Timestamp formats (strftime) in events that aren't defined properly, or are autodetected incorrectly by Splunk Enterprise.

When timestamps vary, buckets capture fewer events before they end up getting rolled. This is because Splunk limits the number of hot buckets that are open at any time point in time and timestamp mismatches cause more hot buckets to be created and rolled.

What Can I Do To Address the Root Cause(s)?

Keep an eye on Health Report bucket monitoring
Look for and address timestamp mismatch issues
Evaluate the use of automation tools which trigger system events such as indexer cluster restarts and shutdowns
Merge an index’s small buckets into larger ones using the bucket-merge command. See Splunk Docs here.

The key questions to ask to determine if small buckets are impacting your deployment are:

How many unhealthy (small) buckets exist in my deployment? How much does it contribute to my overall deployment? On a per-index basis?
Which indexes are the top contributors to small buckets?
What is the correlation between system events and small bucket creation?

Here are some searches you can run to better understand the distribution and presence of small buckets in your deployment:

On each Cluster Manager, just to understand whether bucketing is behaving evenly on IDXers (recommended time range: 7 days):

| rest splunk_server=local /services/cluster/master/peers | rename 
label AS peer_name | stats sum(bucket_count) AS bucket_count by 
peer_name | sort - bucket_count

On each Search Head, to understand whether buckets being rolled are too small (recommended time range: 1 day & 7 days):

index=_internal source=*/splunkd.log* hotbucketroller | stats count by
caller | sort - count

If these exploratory searches determine the presence of too many small buckets in your deployment, you should investigate your data ingestion rules to prevent the problem from happening again in the future. As always, reach out to the Splunk community on Splunk Answers and join an upcoming user group to ask any additional questions about running a high-performing Splunk Enterprise deployment.

Style

two-column

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Platform

2 Minute Read

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Splunk POD is designed to simplify your on-premises data analytics, so you can focus on what really matters: making smarter, faster decisions that drive your business forward.

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Platform

3 Minute Read

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Announcing the general availability of the new workload dashboard – a modern and intuitive dashboard experience in the Cloud Monitoring Console app.

Platform

5 Minute Read

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

The heart of our momentum at Cisco Live APJ is our deeper integration with Cisco, culminating in the Splunk POD and new integrations, delivering unified, next-generation data operations for every organization.

Dashboard Studio: Token Eval and Conditional Panel Visibility

Platform

4 Minute Read

Dashboard Studio: Token Eval and Conditional Panel Visibility

Dashboard Studio in Splunk Cloud Platform can address more complex use cases with conditional panel visibility, token eval, and custom visualizations support.

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Platform

4 Minute Read

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Introducing Resource Metrics in Workload Dashboard (WLD) – a modern and intuitive monitoring experience in the Cloud Monitoring Console (CMC) app.

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

Platform

3 Minute Read

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

The Cisco Data Fabric brings AI-centric advancements to the Splunk Platform, seamlessly connecting knowledge, business, and machine data.

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Platform

3 Minute Read

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Simplify fleet-wide upgrades of Windows Universal Forwarders with Splunk Remote Upgrader—centralized, signed, secure updates with rollback, config preservation, and audit logs.

Platform

3 Minute Read

Dashboard Studio: Spec-TAB-ular Updates

Splunk Cloud Platform 10.0.2503 includes a number of enhancements related to tabbed dashboards, trellis for more charts, and more!

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Platform

2 Minute Read

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Announcing the introduction of Edge Processor for Splunk Enterprise 10.0, designed to help customers achieve greater efficiencies in data transformation and improved visibility into data in motion.

/en_us/blog/fragments/about-splunk

/en_us/blog/fragments/subscribe-footer

Monitoring Bucket Health in Splunk Enterprise

How Do Buckets Work?

What Causes Unhealthy Buckets?

What Can I Do To Address the Root Cause(s)?

Related Articles