Unshackling Integration: Part 1 - Moving Beyond Canonical Data Models

As webMethods celebrates its 25th anniversary, it marks not only a milestone but also an opportune moment for retrospection.

This occasion invites us to delve into the annals of integration history, to re-evaluate the fundamental approaches that have defined the landscape for decades. With the rapid evolution of modern integration architectures and the surge of innovative approaches, it’s imperative to scrutinize the role and relevance of these approaches in today’s dynamic integration environments.

This post is Part 1 of the unshackling integration series, taking a journey through the realms of integration history, challenges existing notions around this, and champions the notion of modern integration approaches, and strategies.

Part 1 takes a look at Canonical Data Models (CDMs) and discusses moving beyond Canonical data models for a more flexible and adaptive integration.

Understanding Canonical Data Models in Integration

Canonical Data Models, herewith referred to as CDM or CDMs (sometimes termed common data canonicals, however the acronym CDC means something different today) are structures organizations create that aim to standardize data formats, definitions, and schemas across systems within an organization. They serve as the “Rosetta Stone” of data transformation, enabling seamless communication by establishing a common ground for data interpretation and exchange. These canonical models typically encapsulate core business entities such as a Customer, an Order, etc., standardizing their representation irrespective of underlying systems.

This has been an integration pattern since the early 2000s, with the pattern documented here ( Canonical Data Model - Enterprise Integration Patterns) and has been present since March 2003 (See the wayback machine like here: Enterprise Integration Patterns - Canonical Data Model (archive.org))

These canonicals are an interim step in any data transformation.

Lets walk through a relatively simple example to explain this.

Canonical Data Model - A worked example

I have 3 systems:

  • Customer Relationship Management (CRM), e.g. Dynamics 365
  • Marketing Platform, e.g. Marketo
  • ERP System (e.g. IFS)

Each one of these systems touches a customer.

  • The CRM stores information acquired from various touchpoints, such as interactions, leads and customer support data
  • The Marketing system collects customer preferences data, data around campaigns, and interactions, customer segmentation, …
  • The ERP would container customer transaction history, maybe their billing address, payment terms, …

A canonical data model aims to unify all these diverse data sources into a standard entity that can be used across the three (or more) systems.

e.g., we might have a customer canonical modelled that might look (in simple terms) like the following:

* Customer
  - Identifier
  - First Name 
  - Last Name
  * Contact Details
    * Contact [..]
      - Type
      - Address Line 1
      - Address Line 2
      - Address Line 3
      - Address Line 4
      - Zip Code
      - Country
      - Telephone
      - Cellphone
      - Email Address
  - Preferred Communication Channel
  - Opted Out

This would also contain constraints, e.g. lengths, types, etc. to enforce the data structures.

Now if you think about a customer flow in simple terms, they typically start as a lead in marketing, then become a customer and buy something.
So, now someone looks at the website and registers interest, and they become a lead in the marketing system. At some point later, they actually become a customer and decide to buy something, so now you want to integrated the marketing system to the CRM and ERP.

A data canonical approach was defined as above to ‘simplify’ this approach, which would look something like the following.

An integration takes the relevant lead from the marketing system, then maps this to the Canonical Data Model for the customer.
This canonical data model is then passed to the CRM integration, which it will use to extract the data, and insert the customer into the CRM system.
Similar for ERP.

We end up with Something that looks like this.

Marketing System Customer > Integration Map > Canonical > Integration Map > CRM Customer
                                                        > Integration Map > ERP Customer

and then maybe we update the customer in the CRM and want to replicate this into other systems, so we do something like:

                      CRM > Integration Map > Canonical > Integration Map > Marketing System Customer
                                                        > Integration Map > ERP Customer

and so on.

So we end up creating the following integration maps to transform the data structure to/from the CDM.

  1. CRM Map to Canonical
  2. ERP Map to Canonical
  3. Marketing Map to Canonical
  4. Canonical Map to ERP
  5. Canonical Map to CRM
  6. Canonical Map to Marketing

The benefits of data canonicals are typically described as follows:

  1. When you add a new application, you only need to define maps to/from the data canonical versus mapping to each system
    This equates to 2 maps for each system, which at 6 systems to interchange data with would become more advantageous.
  2. Helps to standardize and make data consistent, and enforce governance policies
    Canonicals define not just the structure, but also the constraints. Sizes, types, mandatory, nullable aspects are all defined to ensure the data in the canonical is correctly structured and used consistently.
  3. Common intermediary to facilitate interoperability
    Canonicals are used as an intermediary, often with a pub/sub type process, and become the de-facto ‘contract’ of the pub/sub process.
  4. Aids maintainability and helps to reduce technical debt
    It provides a better structure for your integration project as you have a defined process for every system that interacts with the canonical, and makes this easier to understand and maintain.

Sounds great :slight_smile:.

Challenges and Limitations of Canonical Data Models

It’s not all a bed of roses with canonicals, so lets understand some of the challenges and limitations:

  1. Rigid vs Flexible
    It was always very difficult to strike the correct balance between standardization and flexibility. CDMs aim to standardize data structures, however you can often struggle to accommodate diverse system-specific requirements in the CDMs. Star schemas, systems with poor record structures, a race to the bottom for data sizes and the maximum constraints, etc. Adhering to a canonical model leads to rigidity, and can hinder adaptability required for agile and fast evolving business needs, as well as technological changes. These canonicals become a ‘choke’ point in implementation when they need to be adapted quickly, to ensure changes do not compromise the integration landscape.

  2. Transformation Overheads
    Transforming and mapping data from source systems to fit into a singular canonical structure bring significant process and data transfer overhead. Data mapped in the canonical will contain much redundant information unrequired for the particular use-case, but the whole set is included in the canonical. This means you creater overheads mapping these data fields from the source system into the canonical when they’re not needed, and network bandwidth and potential mapping overhead as you shift this around and map to the target system. As your also working with a minimum set of constraints, this could also create a loss of data fidelity and accuracy trying to apply a one size fits all in the canonical.

  3. Maintenance & Governance
    CDMs necessitate governance and control. As systems evolve the canonical model needs to evolve, ensuring backwards compatibility where possible, and making changes across the various endpoints can become very complex, and as business requirements diverge this creates further governance challenges.

  4. System Compatibility
    Systems, and particularly legacy ones often have deeply embedded structures and formats making it difficult to align with a canonical without heave transformations. Equally more modern systems have more flexible and open data constructs which also don’t easily fit with a rigid and controlled canonical.

  5. Scalability and Performance
    As volume and complexity of data increases, canonical approaches give rise to scalability and performance issues, due to mapping overhead.

The Evolution of Integration Architectures

Integration architectures have changed hugely in the last years, with workloads moving from fixed on-premises networks with physical clusters managing state, to iPaaS, multi-cloud, containerized, stateless deployments that scale horizontally both up and down. Lets touch on a few of the evolutions that have happened to integration architectures in a little more detail.

  • Shift Towards Distributed and Agile Architectures: Integration architectures have moved from rigid centralized models to more distributed and agile approaches. Organizations recognize the limitations of strict models and approaches and embraced more flexible and agile integration patterns. This transition allows for greater decentralization, accommodating diverse system-specific requirements, and aids an agile approach.

  • The Rise of API-First and Microservices Architectures: These have revolutionized integration paradigms. APIs serve as a cornerstone for communication, offering a more granular approach to integration. Microservices, with their decentralized nature focus on individual services, with these parts promoting autonomy and scalability, challenging the dominance of central created and maintained integrations.

  • The ‘Connector’ explosion: This pivotal change marks a departure from traditional integration platforms reliant on standardized approaches toward the advent of Integration Platform as a Service (iPaaS). Formerly, integration platforms and strategies relied on adherence to standards to connect disparate systems, using approaches such as SOAP, REST, JMS, MQTT, etc. The growth of iPaaS has brought with is platforms that offer an extensive array of pre-built, often vendor support and out-of-the-box connectors that simplified integration and has liberated organizations from the constraints of standard-based integration, allowing them to seamlessly link various systems, expediting integration development, but also enhances flexibility and ease of use enabling organizations to adapt to evolving requirements and technology ecosystems.

  • Hybrid Integration Strategies: Emerging from modern integration landscapes is strategic blend of self-hosted runtimes, cloud-based runtimes, and the utilization of iPaaS platforms. This hybrid integration approach represents a deliberate fusion of diverse strengths, amalgamating the robustness of existing systems with the agility inherent in API-driven and microservices architectures. By embracing this hybrid model, organizations navigate the delicate balance between the legacy infrastructure’s reliability and the transformative potential of modern integration paradigms. This strategic transition allows for the gradual modernization of integration platforms, seamlessly integrating new architectural patterns while safeguarding and optimizing critical legacy integrations. It empowers businesses to evolve from conventional methods to more responsive, adaptable, and future-ready integration frameworks without disrupting their established operational workflows.

  • Event-Driven Architectures (EDA): Event-Driven Architectures (EDA) advocate for loosely coupled systems designed to respond to events in real-time. This allows for a more dynamic exchange of data, enabling integrations to adapt swiftly to changing circumstances and business needs. By prioritizing responsiveness and flexibility, EDA facilitates seamless interactions between various systems, enhancing agility and enabling organizations to harness the power of real-time data exchange for more informed decision-making, compared to more rigid approaches used historically.

  • Embracing Flexibility and Scalability: The evolution of integration architectures emphasizes flexibility and scalability. Modern approaches prioritize adaptability to changing business needs, seamless integration of diverse systems, and the ability to scale efficiently without compromising performance.

  • Adaptation and Coexistence: Rather than completely discarding any existing integration platform and it’s implemented integrations, canonical models, etc., modern integration architectures tend to adapt and coexist with existing integration deployments, utilizing hybrid channels. Organizations carefully modernize their integration platform, moving from old ways to new ways, leveraging newer architectural patterns for agility and responsiveness without compromising their critical integrations that already exist.

  • Domain Driven Design: This is influencing the evolution of integration architectures, providing a structured approach to integration development aligned with specific business domains, and unlike the heavy and rigid centrally controlled governance approaches used historically, it provides a lighter touch governance. Breaking and structuring integration projects into distinct business domains or subdomains helps to ensure alignment of the integrations with the specific business needs and requirements of that domain. This helps in taking a more modular approach, with a defined responsibility and separation defined by the domain, which facilitates flexibility and controlled governance. This shift from centralized control that typical led to rigidity, now enables decentralized control within the domain area, whilst retaining the governance elements.

Canonical Data Models in Modern Integration?

Agility is key to success in business environments today. The rigidity of CDMs can often create huge challenges in adapting quickly to new business requirements and technology leaps. Any change to a CDM is complex because of the dependencies this has with all other integrations relying on these. Any changes have to be carefully version controlled and modified so as not to affect the existing integrations. As such, CDMs are typically centrally controlled and governed which creates an anti-pattern to agility as you become reliant on a central implementation and commonly a central controlling team with a lot of governance and maintenance overhead which creates a bottleneck, and results in slower integration speeds.

Integrations historically were less time sensitive, sometimes even wholly batch oriented. With the drive towards real-time responses and event-driven systems, the demands of these is typically around instantaneous or near real time responses, which can be hindered by the structural constraints imposed by large CDMs. This is better represented when you consider API and Microservice centric integrations. A good API only takes in the data it needs and returns the subset of data associated with this. If you’re creating an API that uses canonicals, this typically would go against good practice, as both APIs and especially Microservices advocate for smaller , and self-container services, making a CDM unpractical.

Taking these thoughts further, as integration architectures have changed and moved towards a hybrid approach, with integrations running in geographically distributed clouds and self-hosted environments, and integrations other geographically distributed SaaS and self-hosted systems, transmitting a large canonical data set is neither desired nor recommended. Data transferred should be kept to a minimum, to avoid the pitfalls of global internet communication links such as bandwidth, data transfer speeds and latency. Information security practices would also state to only transfer the information needed as this reduces risk.

Lastly, many organizations are doing modern integration under the context of digital transformation and/or customer experience improvements. These demand solutions that can swiftly adapt to changes in technology stacks, cloud migrations, and constantly evolving business landscapes. CDMs whilst offering benefits on standardization are likely to slow down the pace of transformation due to rigidity.

Embracing Change: Moving Beyond Canonical Data Models

So now you understand what a Canonical Data model is, you know the benefits and challenges, and you understand how integration is changing, and why CDMs in a new integration architecture can cause issues. How do you embrace this change and move forward?

Here’s some points I’d like you to explain, to make you think long and hard about approaches of the past:

1. Contextual Standardization.
Modern internet strategies should be focused on contextual standardization. Don’t standardize because you think you should, only standardize if, and where it makes sense. If you can identify critical data entities that require standardization, then do this, but this doesn’t mean you have to standardize everything. You should allow for flexibility in other areas. When you identify data entities you think should be standardized, consider where this data resides, in what systems, in what locations, and how many other systems and locations this data might need to be pushed to. Understand the pitfalls of a canonical approach, and ensure that these re acceptable in the context of that entity and it’s integration needs. If the number of systems to transmit the data between are small, a canonical approach is not needed and would simply create more work.

Number of Systems Number of Maps (To/From Canonical) Number of Point-to-Point Maps
1 2 0
2 4 2
3 6 6
4 8 12
5 10 20

This isn’t the only criteria to consider though, so don’t be fooled into making a decision based wholly on this 1 metric!

2. Embrace API-Frist and Contact Driven approaches to Integration
Embracing API or Contract based integration enables a more adaptable exchange. webMethods Flow Services utilize a well defined contract, which can also be exposed as an API, and when these are well defined, it becomes easier to map and managed more efficiently without being constrained by a ‘one size fits all’ canonical model. Designing contracts are now often a domain-driven initiative, giving the owners of each domain the decision-making rights, while balancing reuse with velocity of development.

Note: don’t also fall into the trap of having APIs at every level. An integration service contract is as good as an API contract. Invoking one integration from another will be considerably more performant that routing out and back in via an API which will make HTTP calls involving serialization and deserialization, and will likely cross authorization boundaries. Use API when you’re going outside of boundaries, not when you’re already inside!

3. Microservice Architectures in Integration
Microservices promote autonomy among services, allowing each service to have it own data representation (a contract!). Maintaining communications through a contract and across boundaries using APIs facilitates flexibility in execution, and independence of deployment, deviating from the constrains of a shared and rigid CDM which would otherwise become a choke point and potential single point of failure.

4. Hybrid Integration
When hybrid integration is used, you can consider a hybrid call as an equivalent to a call in the same boundary, so service contracts still apply. The important point to remember is hybrid runtimes are remote, and this means for performance, you want to transmit only the needed data backwards and forwards across the link. A CDM would include every piece of data in the canonical, and could reflect on this in the transfer times.

5. Continuous Improvement
Transitioning away from CDMs is an evolutionary process, much like transitioning to integration in the cloud. When you embark on this journey, identify areas where flexibility is crucial and continuously refine integration strategies based on the evolving business landscape.

The Road Ahead: Strategies for Flexible and Future-Proof Integration

So finally, with all this, lets sum up the strategies you can apply to enable flexible and future proof integrations without CDMs.

  1. Create a context-driven approach to determine where a more flexible and adaptable/agile approach is needed. If you think you need standardization and veer towards a CDM approach, question if this is truly needed remembering you trade agility and flexibility for rigidity, and a more centralized and controlled integration approach
  2. Consider usage of APIs and/or Events (EDA) to provide effective contract management and standardized communication across boundaries without enforcing a rigidity and centralized approach .
  3. Leverage Microservice approaches to integration for flexible and hybrid integration requirements, fostering adaptability without compromising on interoperability, by providing autonomy and deviates from dependency on centralized and shared assets which not only slow integration development, but also results in deployment complexity.
  4. Use a combination of iPaaS and hybrid integration techniques to invoke remote integrations where available to avoid multi-level API complexities which often result in serialization, deserialization and repeated authentication and ingress points, let alone implementing APIs at levels where they’ll never or rarely be reused.
  5. Don’t ever assume your integration strategy is done. Continuously evolve and embrace iterative improvement as the business landscape changes, continuously refining and accommodating new technologies and approaches over time.
  6. Focus on data quality and governance practices to ensure consistency, accuracy and reliability in data exchange regardless of your integration approaches
  7. Invest in talent development and skill enhancement for integration teams. Integration has changed over the years, and teams should be encourage to learn and adopt modern integration approaches, and foster innovative thinking to evolve the strategy.
  8. Collaborate with technology partners to leverage their expertise in integration architectures and approaches.

References

Name Link
Canonical Data Model - Enterprise Integration Patterns https://www.enterpriseintegrationpatterns.com/patterns/messaging/CanonicalDataModel.html
Canonical Model - Wikipedia https://en.wikipedia.org/wiki/Canonical_model
11 Likes

Thanks @Dave_Pemberton for this, the rigidity and centralisation problem of canonicals is very real. I remember a project in which the developer had to include an arbitrary attachment to an integration, because the canonical was missing data that was required between the two system. They were not allowed to customise the canonical and had a waiting list of 6 months to get any change approved!! So vastly overcomplicated his integration for no good reason.

1 Like

The sad and true part is, it was not just “his integration”, rather business requirement which was rejected/delayed

Thanks @Dave_Pemberton , was nodding my head all the way through your article. As @John_Carter4 mentioned, its indeed a real problem (in the name of Standardisation & Governance). I recently faced this in Pharma (One of the regulated industries, where CDFs are very common and enforced)

Very nice write-up and it brings back a lot of memories (mostly fond) from 20 years ago. Thanks a lot, @Dave_Pemberton.

It is really interesting how the discussion has shifted from the technical perspective to business agility. What I have seen just a few years ago is that something like a department-level (as opposed to the entire organization) “canonical” can work extremely well. It could also be looked at, in terms of Domain-Driven Design, as a bit like an Anti-Corruption Layer.

At the end of the day it comes down to what allows an organization to move forward today(!) at a reasonable pace, while at the same time trying to anticipate future needs. The latter is crucial so that you don’t shoot yourself in the foot.

For those who like books, I can recommend “Just Enough Software Architecture: A Risk-Driven Approach” from George Fairbanks (2010).

4 Likes