Digital Resilience Pays Off
Download this e-book to learn about the role of Digital Resilience across enterprises.
In a recent post, I covered some details around a backup strategy. I left a bit of a teaser at the end, stating I would follow up with a post on index restoration. Well, here it is…
There are a few scenarios you may encounter when trying to restore or recover an index. The simplest scenarios, such as moving an index, are covered very well in the moving indexes wiki topic as well as on our answers site. From a high level, you can move indexes across Splunk installations but must consider the following:
Again, more detail is available in the wiki topic and you should consult that document first. The more complex scenarios come into play when you have been backing up data continuously and you experience a hardware failure of some sort. Examples include:
In both of these examples, you will either want to:
In these scenarios, you should consult with support to make sure that all bases are covered. For this topic, let us consider the most difficult scenario, where we have a primary storage failure. Our intent is to recover and restore things to our last known state. To make things more difficult, we have long term storage that stores the older (cold bucket) data. The high level recovery process should be as follows:
Additional assumptions: our system uses backups and volume specific settings; Primary storage is mounted as /splunk/db/; Long term storage (coldPath) is mounted as /splunk/cold/; BACK-UP storage is mounted as /splunk/backup/
Step 1 – This is something that is beyond the scope of this post, but it’s important to note that this needs to be completed.
Step 2 – Find the buckets that need to be populated on the primary storage. To do this, get a complete listing of the bucket ids in your long term storage. Next, find the bucket ids that have been “backed up” that do NOT exist in the long term storage.
Step 3 – What is an in-flight bucket? When Splunk transitions a bucket from warm to cold, it is considered to be “in-flight”. There is the potential scenario where a bucket has not completely transitioned, specifically if the storage crashed during the move process. You can find these buckets by looking for “in-flight” within the text of the directory name. You should remove the in-flight bucket as well as include this bucket ID as one that must be also copied to the primary storage from the back up location.
NOTE: If you get to this point, it is best to consult with support as you may have edge conditions that warrant modifications to these instructions.
Step 4 – From your backup location, move/copy the complete buckets to the appropriate location on the primary storage.
Step 5 – Once you have put everything back into place, Splunk has intelligence automatically recreate things on startup. However, you may find the need to force the issue and can do this by leveraging the recover-metadata command as well as inserting meta.dirty into the appropriate index location (e.g. – for my default index, create an empty meta.dirty file: $SPLUNK_DB/defaultdb/db/meta.dirty).