The venerable old-skool Splunk forums are now closed. Feel free to search for old content here, but new posts are no longer supported.

Instead, please visit the thriving community at answers.splunk.com to ask and answer questions about your Splunk deployment and how to get the most out of it.

Forums: SplunkAdministration: Splunk backup of indexes

Previous Topic: Unique Keys In Inputs.conf  |   Next Topic: crash


Posts 1–7 of 7

Hi all,

Can you tell me what you are doing for splunk's backup ?
How do you backup your hot indexes ?
Can you tell me if the following procedure is "the way to go" ?

Backup Splunk indexes



Force via crontab every day before backup a roll from hot to warm with :

/opt/splunk/bin/splunk search "| debug cmd=roll index=*" -auth admin:pwd

Take every $SPLUNK_HOME/var/lib/splunk/db/db_* in backup.

I'ver asked that because I've read in many post that forcing a roll of hot indexes is not good for Splunk's performance.

So, if I would like to know how others Splunk user backup Splunk indexes ?

Thanks
Kind regards
Pierre.

It is true that more frequent rolls of Splunk indexes will negatively impact Splunk search performance. The information here is correct: http://www.splunk.com/base/Documentation/latest/Admin/Backupindexeddata#Choose_your_backup_strategy

If you happen to be using ZFS, or are storing your indexes on media that provides a snapshot capability, you can back up the entire db directory (include hot* folders and the rest of the files) from the snapshot, (then drop the snapshot when you're done).

You can also stop splunk and back up all the hot* folders and everything in db/ *except* the db_* folders, start splunk up, then continue backing up the db_* folders. The disadvantage is this causes an outage.

You could also simply back up just the db_* folders without forcing the roll command first. This will mean that your Splunk index data will not be backed up until it rolls over "naturally", i.e., based on the index configuration. The disavantage is that some of your data will not be backed up for a longer period of time. How much data would remain at risk depends on (a) how much you are indexing, and (b) your index size configurations.

If you use this last method (i.e., only back up db_* without rolling first) it is possible to configure index settings to minimize the amount of data at risk to less than 24 hours worth (which is what you'd have using daily backups anyway). This does impact search performance as well, but not as much as using the "roll" command would.

Thanks a lot for your answer. But can you tell me which settings I have to use to obtain a naturally roll each 24 hours ?

I've checked the doc and it is not very clear for me. I've found :

frozenTimePeriodInSecs = 15552000 for erasing indexes;
maxTotalDataSizeMB = 2500000 mas size of an index - go to frozen

Which setting can I use to force a naturally roll from hot to warm ?

Thanks.
Best regards,
Pierre.

Splunk naturally rolls hot to warm when it's ready. It's generally a bad idea to mess with the defaults because if you don't understand how they work (and how they work together, and how the time distribution and input rate of your data works) you can cause big problems and make your search run *extremely* slowly very easily. It's complicated,which is why there isn't a part of the docs that just says "set this".

If you are sure that your data is coming in approximately in real time, you can set this:

maxHotSpanSecs = 86400
maxHotBuckets = 20

for your index in question. However I will caution you that if you sudden index a whole lot of historical data (e.g., add a new server that has a few weeks of old data, even if it's just a little bit of data from that far back), these settings will likely cause the data to be broken up into too-many sections and impair search performance when you search over that time period.

Thanks for your answer. So, we can conclude that it's better to let Splunk roll naturally from hot to warm.

But please can you still answer to these questions : I have configured two separate disks for hot/warm and cold :

homePath = $SPLUNK_DB/defaultdb/db
coldPath = /data2/splunk/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb

How can I control the roll from warm to cold ?. Do I have to use maxWarmDBCount ?

"Buckets roll from warm to cold when the number of warm buckets exceeds the configured maximum count (maxWarmDBCount) "

If yes, how can I evaluate this number to avoid that the disk hot/warm becomes full ?

The same question comes from cold. How can I force Splunk to roll cold to frozen (deleted).

Sorry for these questions and thanks a lot for your help.

Kind regards,
Pierre.

Yes, hot to cold is set by maxWarmDBCount. The amount of space is per "DB" is set by maxDataSize. it's probably not in the docs, but "auto" is 100MB, auto_high_volume is 750MB on a 32-bit splunk, and 10000MB on 64-bit Splunk. Yes, it would be better if it were done by space, but do the multiplication and it should leave you with a bit of headroom. The max space that will be used on the volume will therefore be (maxHotBuckets + maxWarmDBCount) * maxDataSize.

The total size of hot+warm+cold is controlled by maxTotalDataSizeMB. Once you get to that point, cold buckets are frozen out.

Hi,

It's now clear. Thanks a lot for your help.

Kind regards.
Pierre.