In a recent post, I covered some details around a backup strategy. I left a bit of a teaser at the end, stating I would follow up with a post on index restoration. Well, here it is…
There are a few scenarios you may encounter when trying to restore or recover an index. The simplest scenarios, such as moving an index, are covered very well in the moving indexes wiki topic as well as on our answers site. From a high level, you can move indexes across Splunk installations but must consider the following:
- The Splunk instance receiving the index has never been configured with an index of the same name – this prevents bucket ID collision
- All buckets within the index are in a steady state – meaning you have properly rolled them out of a hot state
Again, more detail is available in the wiki topic and you should consult that document first. The more complex scenarios come into play when you have been backing up data continuously and you experience a hardware failure of some sort. Examples include:
- The hardware has completely failed
- The primary storage location has failed
In both of these examples, you will either want to:
- Completely move and load the index files to a separate instance (detailed in the links above)
- Move and/or restore pieces of the index
In these scenarios, you should consult with support to make sure that all bases are covered. For this topic, let us consider the most difficult scenario, where we have a primary storage failure. Our intent is to recover and restore things to our last known state. To make things more difficult, we have long term storage that stores the older (cold bucket) data. The high level recovery process should be as follows:
- Fix the primary storage, or replace it.
- Find the delta of what buckets need to be populated on the primary storage
- Check for any in-flight buckets – what if Splunk was trying to move a bucket from warm to cold?
- Move/Copy the missing buckets onto the new/fixed storage
- Rebuild the Metadata and Manifests
Additional assumptions: our system uses backups and volume specific settings; Primary storage is mounted as /splunk/db/; Long term storage (coldPath) is mounted as /splunk/cold/; BACK-UP storage is mounted as /splunk/backup/
Step 1 – This is something that is beyond the scope of this post, but it’s important to note that this needs to be completed.
Step 2 – Find the buckets that need to be populated on the primary storage. To do this, get a complete listing of the bucket ids in your long term storage. Next, find the bucket ids that have been “backed up” that do NOT exist in the long term storage.
Step 3 – What is an in-flight bucket? When Splunk transitions a bucket from warm to cold, it is considered to be “in-flight”. There is the potential scenario where a bucket has not completely transitioned, specifically if the storage crashed during the move process. You can find these buckets by looking for “in-flight” within the text of the directory name. You should remove the in-flight bucket as well as include this bucket ID as one that must be also copied to the primary storage from the back up location.
NOTE: If you get to this point, it is best to consult with support as you may have edge conditions that warrant modifications to these instructions.
Step 4 – From your backup location, move/copy the complete buckets to the appropriate location on the primary storage.
Step 5 – Once you have put everything back into place, Splunk has intelligence automatically recreate things on startup. However, you may find the need to force the issue and can do this by leveraging the recover-metadata command as well as inserting meta.dirty into the appropriate index location (e.g. – for my default index, create an empty meta.dirty file: $SPLUNK_DB/defaultdb/db/meta.dirty).