This documentation applies to the following versions of Splunk: 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10
This topic discusses backing up Splunk indexed data. It first gives an overview of how your indexed data moves through Splunk, then describes a basic backup strategy based on common or default Splunk index configurations. Finally, it provides options for setting or changing the retirement policy for your Splunk index data.
The default values and policies described in this topic are set in indexes.conf. If you have a more complex index configuration, or have unusual data volumes, you may refer there for more detail information and options. Before modifying any configuration file, read about configuration files.
When Splunk is indexing, the data moves through a series of stages based on policies that you define. At a high level, the default behavior is as follows:
When data is first indexed, it is put into a "hot" database.
The data remains in the hot db until the policy conditions are met for it to be reclassified as "warm" data. This is called "rolling" the data into the warm db. By default, this happens when a particular hot db reaches a specified size or age.
When a hot db is rolled, its directory is renamed to be a "bucket" in the warm db. At this point, it is safe to back up the warm db buckets.
Next, when you get to a specified number of warm buckets (the default value is 300 buckets), buckets are renamed to be cold buckets to maintain 300 warm buckets. (If your cold db is located on another fileshare, the warm buckets are moved to it and then deleted from the warm db directory.)
Finally, when your data bucket meets the policy requirements defined, it is "frozen". The default behavior for this is to delete them. If you need to archive or otherwise preserve that data, you can provide a script that can do arbitrary things to the bucket prior to the deletion.
Summary:
The general recommendation is to schedule backups of your warm db buckets regularly using the incremental backup utility of your choice.
Hot databases can only be backed up using a snapshot of the files, using something like VSS (on Windows/NTFS), ZFS snapshots (on ZFS), or a snapshot facility provided by the storage subsystem. If you do not have such a facility available, the data within the hot databases can only be backed up after it has rolled to a warm db.
Splunk rolls a hot db to a warm db based on the policy you define. By default, the main index is set to roll a hot db whenever it reaches a certain size, or it has not had any data added to it for 86400 seconds (one day), whichever occurs first. (While it is possible to force a roll of a hot db to a warm db, this is not recommended as each forced roll will permanently decrease search performance over the data. In cases where hot data needs to be backed up, a snapshot backup is the preferred method.)
You can set retirement and archiving policy by controlling the size of indexes or the age of data in the indexes. Splunk indexes go through four stages:
Caution: All index locations must be writable.
The size, location and age of these files are controlled by indexes.conf. Before modifying any configuration file, read about configuration files.
If you experience a non-catastrophic disk failure (for example you still have some of your data, but Splunk won't run), Splunk recommends that you move the index directory aside and restore from a backup rather than restoring on top of a partially corrupted datastore. Splunk will automatically create hot directories on startup as necessary and resume indexing. Monitored files and directories will pick up where they were at the time of the backup.
You can set retirement and archiving policy by controlling the size of indexes or the age of data in the indexes.
The size, location and age of these files are controlled by indexes.conf. Before modifying any configuration file, read about configuration files.
Find this entry in indexes.conf and set it to it new value (in megabytes)
maxTotalDataSizeMB = <non-negative number> (500000) * The maximum size of an index. If an index grows bigger than this the oldest data is frozen out.
Example:
[main] maxTotalDataSizeMB = 2500000
You must restart the server for the new setting to take effect. It may take up to 40 minutes for Splunk to move events out of the index to conform to the new policy, during which you may see high CPU usage.
Note: Ensure your values are in the correct units. For a quick calculator, you can do basic unit conversions with Google:
Search Google for "50000 megabytes in gigabytes"
Splunk ages out data by buckets. Specifically, when the most recent data in a particular bucket reaches the configured age, the entire bucket is rolled. If you are indexing a large volume of events, bucket size is less a concern for retirement policy because they fill quickly. You can adjust the bucket size by setting maxDataSize in indexes.conf smaller so they roll faster. But more, smaller buckets take more time to search than fewer, larger buckets. To get the results you are after, you will have to experiment a bit for the right size. Due to the structure of the index, there isn't a direct relationship between time and data size.
Set the variable frozenTimePeriodinSecs in indexes.conf to the number of seconds after which indexed data should be erased. The example below configures Splunk to cull old events from its index when they become more than 180 days old. The default value is approximately 6 years.
[main] frozenTimePeriodInSecs = 15552000
You will need to restart the server for the new setting to take effect.
Note: ensure your values are in the correct units. For a quick calculator, you can do basic unit conversions with Google:
Search Google for "15552000 seconds in days"
To roll the buckets of a specified index from hot to warm, use the following command, replacing <index_name> with the name of the index you want to roll:
From the CLI: ./splunk search "| debug cmd=roll index=<index_name>"
From the search bar: