Topics

| pdf version

General Information

Company Background

Purchasing Splunk

Splunk Base and the Splunk Community

Customers and Partners

Getting Started

How Splunk Handles Data

Administration

Integrating and Extending Splunk

Troubleshooting

Getting Help


Splunk > The IT Search Company

  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk

Localized Splunk documentation

Looking for Splunk documentation in other languages?

Data Management

This documentation does not apply to the most recent version of Splunk.

This documentation applies to the following versions of Splunk: 3.2 , 3.2.1 , 3.2.2 , 3.2.3 , 3.2.4 , 3.2.5 , 3.2.6 , 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13

Data Management

Does Splunk store copies of my log data?

Yes. Splunk stores a compressed copy of the log data along with its index. Once Splunk has accessed a piece of data, it does not matter if you rotate out your logfiles or destroy the original data in any other way.


How does Splunk store its data? Does it use a relational database? What database does it use?

Splunk stores its data using its own highly efficient search index. It is a technology that is closer to that of most search engines than SQL relational databases. It's impossible to get Splunk's instantaneous search results on anything in the original data with a relational database, which can only index a few columns. Also, the search index approach is far more flexible to work with any kind of data without adapters or parsers.


How is the index structured?

Splunk has a concept of hot,warm,cold and frozen 'slices' or 'buckets' of data. A slice is considered hot if we are actively writing/reading from it. This slice is the $SPLUNK_HOME/var/lib/splunk/defaultdb/db/hot-db/ dir. As the hot slice approaches a set limit (configurable) it is rolled to a warm slice. Warm slice can be written to but usually aren't. They have the dir structure of db_timestamp1_timestamp2_sequence_number and are located in $SPLUNK_HOME/var/lib/splunk/defaultdb/db/. Timestamp1 is the timestamp of the latest event in that slice and timestamp2 is the timestamp of earliest event in the slice. The sequence number is the order of the generation of the slices. The data is then moved into the colddb ($SPLUNK_HOME/var/lib/splunk/defaultdb/colddb), depending on how many warm slices you have (again configurable). In the colddb no new events are indexed, they are only searchable. From here depending on your configuration, data is moved out of the index completely. Events are moved out depending on date(age) or total index size. You have the option of saving the data in a frozen state (not searchable or writable), before they are removed from the index. If this data ever needs to be searched you can drop the db_*_*_* dirs into the $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb dir.


Does Splunk compress the data it stores?

Yes. Splunk compresses the original data within its datastore, then adds its indexes and metadata.


What are Splunk's storage requirements?

With default processing Splunk uses about 40% of the uncompressed raw log volume for standard syslog data and up to 100% for many other common log formats. Some data sources and configurations (such as heavy use of meta-events) may cause Splunk to use more while lowering density of indexing can reduce utlization to as little as 12%. In general, Splunk offers the highest search performance at the lowest storage cost relative to any other technology for log data retention.


How much data can Splunk store online? How long can Splunk keep data online?

As much as you want. You control how much data Splunk stores online by means of setting its data retirement policy. Splunk's search performance when looking across a day of data is the same whether the data store contains a day or years of data.


Can Splunk automatically retire older data? How do I avoid running out of disk space?

Yes, Splunk has settings to retire the oldest data based on age and disk usage. It also has a setting for the minimum disk space to keep free. Read the Admin Manual for more information.


Splunk's stopped indexing my data. Is that because I exceeded my license limit?

No. Splunk never stops indexing data because of license violations. It only blocks search if there are repeat violations. If your Splunk server has stopped indexing, there is another explanation. Contact support@splunk.com for help.


How scalable is Splunk? How does Splunk scale?

Splunk's software architecture is designed to be extremely scalable. It can be deployed in minutes to index a few hundred megabytes a day on a server shared with other applications like monitoring, or it can be deployed across dozens of dedicated indexing servers and thousands of source hosts to index terabytes a day in real time.


How dense is the index?

Dependant on how much segmentation is done on the data. For example if we segment 1.2.3.4 based on . (period) we would have to store 1, 1.2, 1.2.3 and so on in the index, which would bloat the index a lot. All this is configurable (however changing the default is not recommended)


http://www.splunk.com/doc/2.2.6/admin/adminreducedensity

Revision: 207 Contact Privacy Policy Terms of Use Community content licensed under Creative Commons