Try going one day without navigating today’s data landscape — accepting or declining cookie pop-ups, determining whether and how a company can use your information, and all the data you’re generating simply by browsing the web.
Yes, we live in the Data Age. We know we generate mind-boggling amounts of data. The data we generate in a single day is an unfathomable amount (2.5 quintillion bytes if you can do that math).
More formally, we say that data has been democratized. Whether all this data is good, bad, or something else altogether, we don’t quite know yet.
What is data democratization?
At its most basic, the democratization of data means simply that more people have access to data than ever before. Data democratization refers to all types of data: structured and unstructured, semi-structured, and even dark data.
So, there’s data everywhere, but what does that really mean?
We can expand the definition of data democratization to the ability for you — an average person, not a data or computing whiz — to access information digitally with the goal of gathering and analyzing your own data, without needing any help from specialists and experts like data scientists, data analysts and more.
(Looking for ways to manage data? Learn about data pipelines, data platforms and data normalization.)
State of data today
Up until very recently, data was not easy to generate. Data that we did have was likely gathered through some structured method, like a survey, that required people, time and manual effort. Computers might log some of what we do, but we didn’t have web browsers and cell phones tracking every movement we made.
Of course, today’s data deluge couldn’t happen without the digital world as we now know it — working from home on laptops and elaborate computer setups, smartphones that are, by any measurement, their own computer, and yes, wearable devices that purport to track all dimensions of “health”, all connected by the internet. Not surprisingly, the vast majority of data today is created digitally.
Benefits of data democratization
The power of data is real, even if that “power” is loosely defined for a variety of purposes. But let’s look seriously at what this influx of data could help individuals and organizations do:
With data available to seemingly everyone, more tools, techniques, and ways to “experience” data are popping up. Go-to tools like Microsoft Excel and SaaS solutions like Tableau alike can help average folks to organize, splice, and visualize data in ways that delivers meaning. Do more of this, do less of that.
With more options, the costs associated with them go down. A decade ago, the cost of manipulating data likely cost significantly more, due to limited datasets, limited skillsets and expensive toolsets. In 2022, however, that cost has come down considerably, as more people have familiarized themselves with affordable data-crunching tools.
This influx of data is advancing some technologies that needed a whole lot more data than was previously attainable.
Machine learning in particular has advanced significantly in the last decade. Previously held back because there wasn’t enough data, today we have plenty of large datasets for machines to learn in a more balanced, insightful way. These have helped power companies to make better decisions and design better products.
Challenges of democratized data
All the positives, however, are balanced out by some challenges that this data presents.
Security professionals know that the more of something, the more there is to secure. With data seemingly everywhere, these experts have to focus on the countless places data actually lives and find ways to keep it secure as possible. And that’s before you heap on any industry- or geographically specific data regulations, like GDPR and HIPAA in the U.S.
Of course, you can deploy various data encryption methods and techniques to offer more security layers for your data, but there is no point where you can reliably say "We're OK, all of our data is definitely secure."
More data, more concerns about how that data is being used. Which companies sell your data, which use it only for internal shipping and marketing purposes? What happens when one of those companies are hacked or have some other data breach?
Websites like haveibeeenpwned have been helping individuals track their data for over a decade, well before companies publically disclosed such data loss. But knowing someone accessed your data rarely means there’s something you can do about it. Either way, the topic of data privacy is clearly on the rise:
Google Trends shows how people worldwide have searched for “data privacy” over the last decade. We see a steady interest for several years until a significant high point in May 2018, which coincides with the disclosure of the Facebook-Cambridge Analytica scandal. Considered the watershed moment for data privacy, the scandal centered on a consulting firm collecting personal data from millions of Facebook users without their consent — to be used primarily for political advertising.
And this year, we see the highest interest ever in people searching online for the concept of data privacy.
Determining access from siloed data
Data access is one challenge that might not seem like too much work…at first. If you’re sharing data across a family or a small business, there’s not so much data to share and it’s not so spread out — maybe everything is stored in Google Docs or one cloud workload. But that does not scale.
Consider a medium-sized business in one city, like a real estate firm that serves a hyperlocal region. With a few employees, data is more easily accessed and has a smaller attack surface. Now play that out for an international or global organization who is dealing with both internal access decisions in additional to varying geographic needs.
Many companies today, of a variety of sizes, might be utilizing many data warehouses and data lakes. Questions to consider:
- Who gets access to data?
- Does everyone get access to all data?
- Is all data necessary for all people doing their jobs?
- How does this access play out for security and privacy concerns?
Trustworthiness (A healthy skepticism?)
The biggest question might not be about any of these things — but whether you can actually trust that data. (This is sometimes referred to as data quality.)
Just because you’re collecting data by the reams doesn’t mean that data is useful. In fact, it often is not useful. For example, website analytics are wildly prone to error and misinterpretation, as Gerry McGovern explains. Questions to ask yourself when looking at data:
- Do you have a team of analysts that understand the data holistically?
- Do they understand what it can and cannot do?
- Do you understand what the data says, as it’s presented to you?
- Perhaps most importantly, do you get a useful picture from that data?
Or does that data, at the end of the day, basically indicate not all that much?
Just because you have access to data does not promise that you have useful information. In the pyramid of knowledge, data is merely the building block before it becomes information. Only as we process and truly understand that information does it become knowledge. When we talk about being data-driven, we also don’t have to let logic go by the wayside.
Data democratization: It’s still complicated!
As the digital data piles up in data centers worldwide, the only sure answer we have is that we are collecting data — but we’re generally not doing too much with it. The question is probably well past “Do we democratize our data?” Instead, organizations should be asking whether the data we are relying on is making us smarter and more efficient.
I suggest asking: “If we didn’t have this data, what would we do?” Sometimes decisions are simply logical. Making something more visible that helps the user might not need data to prove its worth. With all this data, what are you gaining — and what are you losing? As Patricia Marx writes in The New Yorker:
“Craigslist has not gone public and has made only a small profit since its beginnings, compared with Facebook, which made eighty-six billion dollars in 2020, the vast majority in targeted ads. When you get rid of a couch on Craiglist, you are getting rid of a couch; when you get rid of a couch on Facebook, you may be saying goodbye to your data, too.”
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.