This documentation does not apply to the most recent version of Splunk.
This documentation applies to the following versions of Splunk: 3.2 , 3.2.1 , 3.2.2 , 3.2.3 , 3.2.4 , 3.2.5 , 3.2.6 , 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13
Yes. Splunk stores a compressed copy of the log data along with its index. Once Splunk has accessed a piece of data, it does not matter if you rotate out your logfiles or destroy the original data in any other way.
Splunk stores its data using its own highly efficient search index. It is a technology that is closer to that of most search engines than SQL relational databases. It's impossible to get Splunk's instantaneous search results on anything in the original data with a relational database, which can only index a few columns. Also, the search index approach is far more flexible to work with any kind of data without adapters or parsers.
Splunk has a concept of hot,warm,cold and frozen 'slices' or 'buckets' of data. A slice is considered hot if we are actively writing/reading from it. This slice is the $SPLUNK_HOME/var/lib/splunk/defaultdb/db/hot-db/ dir. As the hot slice approaches a set limit (configurable) it is rolled to a warm slice. Warm slice can be written to but usually aren't. They have the dir structure of db_timestamp1_timestamp2_sequence_number and are located in $SPLUNK_HOME/var/lib/splunk/defaultdb/db/. Timestamp1 is the timestamp of the latest event in that slice and timestamp2 is the timestamp of earliest event in the slice. The sequence number is the order of the generation of the slices. The data is then moved into the colddb ($SPLUNK_HOME/var/lib/splunk/defaultdb/colddb), depending on how many warm slices you have (again configurable). In the colddb no new events are indexed, they are only searchable. From here depending on your configuration, data is moved out of the index completely. Events are moved out depending on date(age) or total index size. You have the option of saving the data in a frozen state (not searchable or writable), before they are removed from the index. If this data ever needs to be searched you can drop the db_*_*_* dirs into the $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb dir.
Yes. Splunk compresses the original data within its datastore, then adds its indexes and metadata.
With default processing Splunk uses about 40% of the uncompressed raw log volume for standard syslog data and up to 100% for many other common log formats. Some data sources and configurations (such as heavy use of meta-events) may cause Splunk to use more while lowering density of indexing can reduce utlization to as little as 12%. In general, Splunk offers the highest search performance at the lowest storage cost relative to any other technology for log data retention.
As much as you want. You control how much data Splunk stores online by means of setting its data retirement policy. Splunk's search performance when looking across a day of data is the same whether the data store contains a day or years of data.
Yes, Splunk has settings to retire the oldest data based on age and disk usage. It also has a setting for the minimum disk space to keep free. Read the Admin Manual for more information.
No. Splunk never stops indexing data because of license violations. It only blocks search if there are repeat violations. If your Splunk server has stopped indexing, there is another explanation. Contact support@splunk.com for help.
Splunk's software architecture is designed to be extremely scalable. It can be deployed in minutes to index a few hundred megabytes a day on a server shared with other applications like monitoring, or it can be deployed across dozens of dedicated indexing servers and thousands of source hosts to index terabytes a day in real time.
Dependant on how much segmentation is done on the data. For example if we segment 1.2.3.4 based on . (period) we would have to store 1, 1.2, 1.2.3 and so on in the index, which would bloat the index a lot. All this is configurable (however changing the default is not recommended)