Skip to main content


DATA INSIDER

What Is a Data Mesh?

A data mesh can be a concept and practice for managing large amounts of data across a distributed network, the platform that performs this function, or both. As organizations become more and more dependent on the ability to store large amounts of data, channel it through data pipelines and rapidly put it to use, it becomes increasingly important to create an effective and efficient schema for managing and using that data. A data mesh attempts to do that by creating a replicable way of managing various data sources across an enterprise's ecosystem and making it more discoverable, while at the same time allowing data consumers faster and more efficient and — secure — access to the data they need in a particular category.

In the following article, we’ll discuss various use cases for data mesh, its value for businesses in strategic decision making, and how to move beyond traditional domain-driven designs and start thinking about implementing this framework in your organization.

What Is a Data Mesh? | Contents

What are some use cases of a data mesh?

As the volume of data in need of collection continues to increase, organizations face a situation where they need to store it without being able to organize, cleanse and categorize it first.

The use cases for data mesh are designed for democratization, providing data access so data producers can use their data more quickly and effectively. Some common scenarios involve:

  • Real-time analytics/Business intelligence
  • IoT analytics
  • Customer intelligence
  • Fraud detection
  • Observability
  • Logistics

What value does a data mesh architecture bring to an organization?

The primary value of a data mesh is that it allows organizations to use their data with a specific business output in mind. A data mesh allows an organization to find data gathered from similar, multiple use cases from anywhere in the network and combine it to deliver specific insights or outcomes related to specific topics. These use-case focused combinations of data are often called “data products.”

In other words, a data mesh is designed to find all the data needed to address a specific use case. If a data lake is broad and deep and must be searched carefully, a data mesh is stretched out across all data and can quickly identify the data necessary to support a particular outcome.

With the growth of hybrid data environments, the challenge of data democratization — enabling equitable access to data while simultaneously keeping it secure and implementing access control — has grown significantly more complex. A data mesh provides an overall framework that allows for consistent data ownership, while ensuring scalability and allowing it to be available quickly when needed.

Some added benefits of a data mesh include:

  1. Greater autonomy to data stakeholders to make decisions on data mgmt and use
  2. Enforcement via policy, using shared data sets as opposed to replicated data sets
  3. Transparency and visibility of data use across departmental silos

 

Zhamak Dehghani, credited with creating the concept of data mesh in 2019, maintains that a data mesh is a data platform encompassing not just the concepts and protocols, but also the equipment used. Dehghani sees a data mesh architecture as the next iteration of data storage beyond a data lake, with the added advantages of speed, efficiency and specificity. In a May 2019 blog post, Dehghani wrote,

“Be open to the possibility of moving beyond the monolithic and centralized data lakes to an intentionally distributed data mesh architecture; Embrace the reality of ever present, ubiquitous and distributed nature of data.”

zhamek-dehghani-headshot zhamek-dehghani-headshot

Zhamak Dehghani, credited with creating the concept of a data mesh, sees a data mesh architecture as the next iteration of data storage.

What are the components of a data mesh architecture?

While often subjective, one school of thought is that a data mesh architecture, or a data mesh framework, is centered around data products oriented around domains and owned by independent domain teams within the organization. Those teams contain embedded data engineers and data product owners, who in turn use common data infrastructure as a platform to host their data assets.

The data mesh platform is essentially data architecture that’s intentionally designed for data integration, providing interoperability and scalability. The platform is enabled by a larger standardized data infrastructure that ensures data quality — often with machine learning and other advanced technologies — as opposed to an environment of fragmented silos that impede data mobility and access.

Other data mesh proponents see it more as a concept and a set of protocols, rather than a physical platform for data storage in use. At its core, a data mesh architecture is a network of distributed data processing nodes that link entities that “hold” data (such as data lakes) and the application domains that “act upon” that data, in an accessible, highly available, and secure manner.

What are the steps for implementing a data mesh framework?

The concept of a data mesh infrastructure is still being defined, which presents challenges to data teams when laying out a step-by-step plan for implementation. Some vendors offer an end-to-end data management solution that meets many of the characteristics of a data fabric. More and more organizations are making a data fabric approach a part of their service offerings.

What is the difference between data fabric and a data mesh?

Data fabric and data mesh are similar in that they are concepts and methodologies for data governance, laying the foundation for how organizations deal with large amounts of stored data. It’s generally agreed that a data fabric methodology attempts to manage analytical data by building a single management layer on top of all data, wherever it’s stored. The data mesh approach is different in that aspects of certain types of data management are left under the control of the teams or groups in the organization who use that data.

Whether viewed as a concept for data organization, or a data platform architecture to put it to use, the term “data fabric” essentially defines a methodology for connectivity that integrates all of an organization’s data, across all storage and use environments, and applies a common set of protocols, procedures, organization and security. The data fabric concept is inextricably linked to other big data concepts like data lake, data warehouse, and even data lakehouse.

Like a data mesh, the core principle of data fabric architecture is that it is applied across all data structures and business domains in a hybrid multicloud environment, from on-premises to cloud to edge.

Data fabric and data mesh are two connected concepts that thus far are not clearly delineated. The primary discussions about the difference between data fabric and data mesh are being held among vendors and data scientists tasked to define the space. While there may also be vendors promoting their idea of a data mesh, the concept is not clearly defined.

data-mesh-diagram data-mesh-diagram

The concepts of data mesh, data lake and data fabric — while different— all lay the foundation for how organizations address large amounts of stored data.

What is the difference between a data lake and a data mesh?

A data lake is a repository of data in its raw format, not sorted or indexed in any way. The data can be anything from a simple file to a binary large object (BLOB) like a video, audio, image or multimedia file. Any manipulation of the data to make it usable — discovery, extraction, cleansing and integration — is done when the data is extracted.

A data mesh can be used to provide structure and organization to enterprise data stored in a data lake. The data mesh protocol manages extraction functions and the data mesh architecture performs them when extraction takes place. In fact, many experts in the field credit the necessity of managing data in a data lake to the rise of the data mesh.

The advantage of a data mesh architecture to an organization is similar to that of a data fabric in that it makes data ingestion, storage and extraction as efficient and effective as possible, allowing the organization to filter and curate it before it is needed for a business purpose.

A data mesh also provides consistent protocols for safe and effective storage and security. In regulated environments, a data mesh can be configured to help the organization comply with privacy and security mandates.

The Bottom Line: Data owners need to keep an eye on the evolution of data mesh

Much of what has been discussed in this article is forward-looking. Even among experts, there isn’t universal agreement on whether data mesh is a concept or an actual infrastructure. That said, there is great value in allowing organizations to store data quickly, easily and cheaply and perform the necessary manipulations when the data is needed for use. It also makes sense to allow users with domain expertise to have the most control over how their data is extracted while minimizing wait times for consumers. Data mesh architectures could help enterprises achieve the long term goal of a single version of truth, by eliminating data duplication and control gaps between producers and consumers of data.

The future of data mesh architecture is not certain, and there are multiple ways to achieve the same goals around storing, accessing and querying data. Ultimately, organizations aim to store siloed data cheaply, understand what it is and how it relates to data in other silos and make it available when needed to provide insights and business value — that’s where a data mesh architecture comes in. If your organization depends on making use of large amounts of data, then keeping track of the evolution of the data mesh will most definitely be beneficial in the near future.

What is Splunk

 

More resources: