CDMs for Enterprise Data: Canonical Data Model Explained

On their own, enterprise applications and systems are not always straightforward. Writ large, they are complex, integrated environments, full of multiple data formats and structures. You spend a great deal of effort and time to define and maintain diverse data models among these integrated components.

A Canonical Data Model helps reduce that burden significantly — by promoting a standard and consistent data model between connecting components. This article describes a few things to get you started:

  • What a Canonical Data Model is
  • How CDMs work
  • Advantages to organizations
  • How you can implement one

Defining Canonical Data Model

The Canonical Data Model (CDM) is a data model with a standard and common set of definitions, including data types, data structures, relationships and rules — all independent of any specific application.

Applications must create and consume messages in this common format when exchanging data between them. A canonical data model is not an amalgamation of all data models. Instead, it is a single universal data model between integrations. CDMs aim to:

  • Reduce the dependencies associated with integrations.
  • Allow smoother integration between applications and systems.

The term ‘canonical’ refers to anything that follows a general rule or accepted procedure—aka it’s part of the canon. So, this data model is one that follows these general rules you’ll lay out.

Drivers that lead to CDMs

It is essential to know the complexities associated with integrations to understand why CDMs were introduced. Current application architectures consist of several integrations with sub-systems and applications that use different technology stacks or programming languages. Microservices, service-oriented architectures (SOA) and distributed systems are some examples of highly integrated architectures.

Each architecture has a different format, complicating data exchange, data governance and interoperability across integrated applications and systems. A CDM enables all integrations to share a common understanding of the data that passes between them. It minimizes dependencies between integrations, improving data consistency and data governance. 

Related data topics

Since you’re already exploring CDMs, you might be interested in these additional topics: data management, data pipelines, data observability, data quality, data normalization and ETL.

How does the CDM work? 

Suppose an online learning application integrates with several other sub-systems, like student registration, course enrollment, and a payment system. Each sub-system may maintain client data (student and instructor) in different data types, formats and structures.

For example, a student registration system in Node.js may store information in MongoDB. In the meantime, the main learning application is in Java and stores data in relational databases. 

A company can create a CDM with standard data types, formats and structures to integrate this client data across the above-mentioned systems. The CDM can be defined in an agreed-upon format like Plain Old XML (POX), SOAl and JSON. It can include data fields like student name, ID, email address, phone number, etc.

The systems should agree on a common name for each data field. For instance, if one system uses "Student ID" as a field name and another system uses "Student No," both can be mapped to the "Student No" field in the CDM.

The student registration system transforms the student data into the standard format of CDM before sending it to the main application. After receiving data from the registration system, the main application will transform the data into its own format.

Benefits of CDMs: Why you need one

A CDM brings many benefits for current enterprise applications integrated with different systems and third-party applications. The following are some key benefits of a CDM:

Reduce the number of data translations between multiple systems.

Suppose your company has three different systems (X, Y and Z) that need to connect with each other. It will require a maximum of six data translations from X-Y, Y-Z, X-Z, and vice versa. If you use a CDM, the maximum number of data translations will also be six.

Without the CDM, you will have to perform more data translations as the number of connected systems increases. So, you can reduce the number of data translations and the burden of maintaining them by using a CDM. 

Improve data consistency and communication across systems

A CDM provides a standard data model across different systems, regardless of their data models. This standardization encourages organizations to maintain consistent:

  • Data formats
  • Definitions
  • Structures

Furthermore, it results in high-quality data, which helps them make better business decisions. Additionally, the communication between systems will be consistent regardless of the number of integrations that are added in the future.

Improves flexibility and business agility

The CDM is independent of integrated applications and systems, allowing you to implement new integrations easily. It allows organizations to expand their operations with fewer complexities and integration costs. In addition, this flexibility helps them respond to changing business needs faster. It leads to improved responsiveness, resilience and agility of the company.

Maintains translations easily

Another important benefit a CDM brings is the reduced effort required to maintain translations. Suppose you need to replace, delete or update one integrating system. Without a CDM, you will have to check the data translations of every system that connects to it, which is costly and time-consuming.

In contrast, you only have to check the data translations to and from the CDM when there is a CDM between connecting systems. It allows for easy maintenance of integrations. 

Maintains business logic easily

The CDM not only maintains the data translations but also makes it easy to maintain the logic between integrated systems. You must check for dependencies between the existing data model and the logic if there is a change for an integration. Then, you should make changes to the logic accordingly. Since the logic is used with CDM, changes to the system do not require changes in the business logic of the integration layer.

Examples of CDMs

Different industries use CDMs to set a standard between data and communications within their diverse applications and systems. The following are some common examples of such CDMs that are in use today. 

Clinical Document Architecture (CDA) of Health Level Seven (HL7)

Healthcare providers, like hospitals and medical laboratories, have systems like patient registration, patient tracking, clinical histories and payment systems. HL7 is a set of standards that defines a common message format to exchange electronic health records between different healthcare applications. Its latest standards include protocols such as HL7 V2, V3, FHIR and CDA.

Clinical Document Architecture (CDA) is one of the primary standards based on XML, specifying the encoding, structure, and semantics of clinical documents. It can include clinical information like medical history, discharge information and special medical reports of patients. 

Microsoft Common Data Model (CDM)

Prior to the CDM, the communication between Microsoft apps was done app by app and integration was hard to maintain, expensive and overall challenging. The Microsoft CDM was introduced to reduce these complexities and enable the integration of different MS apps.

For example, different versions of MS Dynamics 365 store and process the same data differently. The CDM allows these two versions of the apps to match up the data in their own way and easily exchange information between those applications. 

Data models of OpenTravel Alliance (OTA)

OTA defines a common message format to exchange data between travel, tourism and hospitality systems that belong to hotels, airlines, railways, cruise lines and distribution/logistics companies. These companies can use it to enhance the interoperability between their electronic systems.

For example, the industry-standard XML schema of OTA allows airlines to automatically transfer e-tickets to another airline system. Its CDM is an XML schema with a standard format for exchanging data like ticket pricing and reservations. OpenTravel's 2.0, released in 2016, enables exchanging JSON messages with existing XML messages. 

Data models of the Open Geospatial Consortium (OGC)

OGC defines standards for exchanging geospatial data between geographic information systems (GIS) applications. Its CDM allows exchanging of geographic information in formats like points, lines, polygons, etc. Other industries, like energy & utilities, aviation and emergency response and disaster management use it to improve system interoperability.

How to implement a canonical data model

Building a CDM involves several steps, from understanding your domain to implementing the CDM by mapping data. The following steps illustrate how to build a CDM for an organization from scratch, using a generic example. 

1. Understand the connecting systems

Starting with knowing the connections. (A CMDB might be helpful here.) For example, if your domain is a retail business and you want to build a CDM to exchange customer orders, you must know the connecting systems or applications that store and process customer data.

Additionally, identify the workflows within those systems.

2. Identify the data sources for connecting systems

You must know how each system stores the customer order data, what data types and relationships exist, and in what structure and format they store the information. This step helps you identify the common data maintained across different systems. 

(Understand various data structures.)

3. Define the CDM

Once you have completed the second step, the next step is defining your CDM by introducing the standard data types, structures and relationships that will serve all the connecting systems. 

4. Map the data

Next, map all the data of the connecting systems and their relationships to the CDM. 

5. Build the CDM and translators

Finally, build the CDM and data translators that help translate the data model of each system into CDM and vice versa. 

Data model that are canon

Canonical Data Model defines a data model with a standard set of data types, structures, relationships, and rules independent of any specific application. It enables easy data exchange between integrating applications and systems, allowing interoperability between them regardless of their technological differences.

Integrating components must create messages in this common format and translate the messages to convert them into their format. Nowadays, a CDM brings many advantages for enterprises, such as reduced data translations, improved data consistency, integration flexibility and business agility. At the bottom line, it reduces the maintenance costs of data translations and business logic. 

What is Splunk?

This posting does not necessarily represent Splunk's position, strategies or opinion.

Shanika Wickramasinghe is a software engineer by profession and a graduate in Information Technology. Her specialties are Web and Mobile Development. Shanika considers writing the best medium to learn and share her knowledge. She is passionate about everything she does, loves to travel and enjoys nature whenever she takes a break from her busy work schedule. She also writes for her Medium blog sometimes. You can connect with her on LinkedIn.