In this post, I’ll cover one strategy to backup your index. Before we go any further…
- Do not do any of this on your production system without testing
- This applies for version 4.2.x only
- You should have a very good understanding of Splunk administration, indexes, and buckets (http://docs.splunk.com/Documentation/Splunk/4.2.4/admin/HowSplunkstoresindexes)
- Read this: http://docs.splunk.com/Documentation/Splunk/4.2.4/Admin/Backupindexeddata
Let’s assume we have a standalone Splunk deployment that indexes 10 GB/day. Our goal is to make sure we have a backup on a daily basis, extending all the way back to Splunk’s first received event. The strategy encompasses a few steps that basically take chunks of the index at set intervals. We will accept the potential to lose data for the last day, but want to be able to recover completely from the day before. Here is the general process:
- Roll all indexed data from hot to warm
- Copy the newest bucket(s) to your long term storage
- Repeat steps one and two at a set interval (daily)
While this sounds somewhat simple, the details are the most important part.
1 – Roll your hot buckets to a warm state
By default, Splunk will roll a hot bucket to a warm state once it is filled. The default hot bucket ‘max’ size for the main index is set to ~10 GB on 64 bit installations. This means that when the bucket grows to ~10 GB (not actual volume, but total bucket size), it will automatically be rolled to a warm state. If the overall index is growing about 10 GB/day, we could expect the buckets to roll almost daily. You should consider the fact that the daily indexed volume (what you are charged for) will differ from the actual bucket size due to compression of index/bucket files. For our exercise here, we want to force this daily so we will execute the roll command.
Execute the Hot to Warm roll command:
./splunk _internal call /data/indexes/<index_name>/roll-hot-buckets –auth <admin_username>:<admin_password>
To roll the hot buckets in the main index, using the admin user with a password of changeme:
./splunk _internal call /data/indexes/main/roll-hot-buckets –auth admin:changeme
Some important notes:
- You will roll all hot buckets with the above command, meaning you may have to copy and move more than one bucket
- For more detail: http://docs.splunk.com/Documentation/Splunk/latest/Admin/Backupindexeddata#Rolling_buckets_manually_from_hot_to_warm
2 – Copy the bucket:
Copying a bucket is a simple command, but you must perform the following actions prior to moving the bucket:
- Check for all new buckets (highest warm bucket id is likely the newest, but NOT always true)
- Check for a non-zero length optimize.result file within the bucket.
- Copy the file to long term storage
- Upon first “backup”, you may need to copy over ALL warm AND cold buckets to long term storage
You can run a command similar to below that uses your new bucket.
cp -R <new_warm_bucket> <backup_location>
To copy my main/default bucket id=0 to a location on /mnt (your bucket will have a different name and id):
cp -R /opt/splunk/var/lib/splunk/defaultdb/db/db_1314376440_1314307800_0 /mnt/backup/defaultdb/db/
3 – Repeat
Now that we know how to force a roll of a bucket, as well as the conditions to copy it, you can simply create a script to perform all of these checks. I will leave it to you to choose your favorite language to create the script.
Things to consider:
- If you run the roll command against all hot buckets, this could create many new buckets per day. You should account for this, or backup based on the Splunk’s automated rolling (see next bullet point)
- Since Splunk will automatically roll buckets to warm at it’s own pace, you can simply do a check for any new warm buckets and only back those up. This may not keep you backed up to the day, but will keep you backed up to the latest warm bucket.
- If Splunk has been running for a long time, you will likely need to backup all existing cold and warm buckets prior to all of this. This strategy is intended to cover all new indexed data.
- In your backup script, you might want to set $SPLUNK_HOME and $SPLUNK_DB
So once you have your index backed up, you may want to restore things. That is probably best saved for another post, but here are some general tips for restoring:
- Do NOT copy buckets back into a working Splunk index – consult with support or load into a new index
- Test/load the backup files. Simply copy/load the index as a new and differently named index. This will prevent bucket ID conflicts.
- See the following post for moving an index: http://wiki.splunk.com/Community:MoveIndexes
UPDATE – Here is the follow up post on restore: http://blogs.splunk.com/2012/02/21/restoring-an-index/