LEADERSHIP

What is Dark Data? We Answer the Five Biggest Questions

Earlier this year, Splunk released the first-ever global research about the problem and value of dark data. We surveyed more than 1,300 business and IT decision-makers across seven leading economies. In short, they all agreed that dark data is crucial to business success and the vast majority of businesses aren’t prepared to take advantage of it. I’ve had a lot of conversations since then about the report and about the nature of dark data itself. Let’s talk about some of the fundamentals of dark data and why it’s so important for businesses to navigate.

1. What is dark data?

Dark data is all the unknown and untapped data across your organization, generated by systems, devices and interactions. Dark data can be any kind of data. It can be a log file from a website, server, network or application. It can be data collected by sensors on industrial machines or the data generated by connected devices that are part of the Internet of Things. The sole defining characteristic of dark data is that you don’t know you have it and therefore aren’t using it.

2. Is dark data the same thing as unstructured data?

Dark data is not the same as unstructured data (data that doesn’t fit into a conventional data model) although it can sound very similar. Again, the defining characteristic of dark data is that you don’t know you have it. So all dark data is unstructured data, but not all unstructured data is dark data. You won’t know which it is until you find it.

3. How do you know if you have dark data?

The answer to this question is yes, you have dark data. Every organization does. Take a look at our dark data report to find out how we know. The sheer amount of data being generated by any organization of any size in the 21st century guarantees that you have data slipping through the cracks. It may be datasets that have a structure you don’t understand, or you don’t know what the data represents or it’s in a corner of the organization that you don’t often look into. But you have it.

4. How do you find dark data?

Finding dark data in your organization is the biggest challenge. How do you find something if you don’t know it exists? You could compare it to finding a needle in a haystack, but if you find the needle, at least you know it’s a needle. Trying to find dark data is more like exploring a subterranean cave in total darkness. Maybe the cave is empty, maybe there’s a chest of pirate gold. You can crawl around on your hands and knees for days without finding anything. Even if you bump into something, you won’t know what it is.

Conventional data analysis tools won’t work. Analytics and business intelligence tools rely on structured data. So do relational databases. You need a platform that supports all your data, in all of the dark corners of your organization, and gets you answers. You’ll be able to search that cave a lot more effectively if you bring a flashlight.

5. What can you use to find and use dark data?

The ideal platform for finding dark data is built to use unstructured data. But there’s more to it than that. You need a platform that automatically detects what type of data it’s looking at, ingests it and prepares it for analysis. Query languages like SQL require you to structure your queries based on the structure of the data, which you can’t do if you don’t know the structure of the data.

At this point, you probably won’t be surprised to hear me say that Splunk is the ideal platform for finding and identifying dark data and using it to drive business outcomes.

Going back to the cave metaphor, Splunk is a flashlight. Splunk lets you go into any situation and start asking questions right away. Splunk doesn’t care if your data is structured or unstructured; it was built to be an investigative platform. It’s built to allow you to ask questions against data that has almost no format at all. You can poke around and start to understand the data and ask questions before you understand the full story.

Dark data may be the biggest untapped resource in business today. To learn more, download the full dark data report.

Tim Tully
Posted by

Tim Tully

Tim has served as Chief Technology Officer since 2017. Prior to joining Splunk, he spent 14 years at Yahoo in various roles, including leading engineering for the Media Organization, where his teams developed leading brands such as Yahoo.com, Yahoo Finance, Yahoo Sports, Yahoo Fantasy, Tumblr, Huffington Post and Flurry. He previously served as Yahoo’s Chief Data Architect, where he led architecture across all data teams and developed much of that stack as well. Before that, Tim served as a Member of Technical Staff at Sun Microsystems in the JavaSoft team. Tim holds an M.S. from Carnegie Mellon University and a B.S. from the University of California, Davis.

TAGS

What is Dark Data? We Answer the Five Biggest Questions

Show All Tags
Show Less Tags

Join the Discussion