Disk Space Estimator for Index Replication

By Mustafa Ahamed

One of the first questions customers ask when they start considering index replication is about storage requirements. Index replication keeps additional copies of data for redundancy purposes, but how would it affect the storage needs and what are the factors to consider in designing scalable storage architecture are the main questions. I’ll cover the important factors in this blog post.

There are two major dimensions to consider. First one is the replication policies and the second one is the data retention period.

Replication Factor (RF) and Searchability Factor (SF) control the replication policies. RF determines the number of raw data files to keep while SF determines the number of time series indexed files. For syslog data, the raw data files take about ~ 15% of disk space and index files takes about ~ 35% of disk space.

The second dimension is retention period. This determines how long you want to keep the data in Splunk before aging out the old data. Typical aging policies are 3 months to 6 months, although we have seen cases were the retention period is in years.

Let’s walk through an example to see these numbers in action. Assume that the daily indexing volume is 200GB, RF and SF is set to 2 and we have a 2-node cluster. Let’s use a retention period of 45 days.

Raw data files related storage needs = 15% * 200 * 2 * 45 = 2.6 TB

Index data files related storage needs = 35% * 200 * 2 * 45 = 6.4 TB

Total space required on the cluster to store 45 days of data = 2.6 + 6.4 = 9 TB

Space required on an individual peer = 9 / 2 = 4.5 TB.

So, using this little formula we have roughly identified that we need 9 TB of disk space on the entire cluster to store, replicate, and retain data for 45 days. You can adjust the retention period and replication policies to see how it would affect your storage needs.

Mustafa Ahamed

Mustafa has been with Splunk for 10 years and leading the Product Management for Splunk Enterprise Platform. He's passionate about large scale deployments and complex systems. Love to travel, explore new places and food!

Tips & Tricks 15 Min Read

Splunk Connect for Syslog: Turnkey and Scalable Syslog GDI - Part 4

In the final installment of this four-part blog series on Splunk Connect for Syslog, we'll walk through the configuration of an entirely new data source – one that SC4S does not address out of the box.

Tips & Tricks 4 Min Read

Cybersecurity and the Crisis in Ukraine

Splunk's President of Products & Technology, Shawn Bice, shares tools, guidance and support for organizations during this crisis.

Tips & Tricks 3 Min Read

Alerts and Dashboards and Searching, Oh My!

Apply a Johari Window approach within Splunk to make your incident management work for you!

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

Disk Space Estimator for Index Replication

Related Articles

Splunk Connect for Syslog: Turnkey and Scalable Syslog GDI - Part 4

Cybersecurity and the Crisis in Ukraine

Alerts and Dashboards and Searching, Oh My!

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram