How to Design the Canonical Document

We plan to implement the Canonical model in the Next Gen B2B system
we’ d like to build ESB(IS+broker) for internal integration. any external document will go through the ESB before post the back-end application. we meet a problem design the Canonical Document, we have
different format from multiple partners, just like Purchase order
There are following option:
850(X12)
ORDERS(EDIFACT)
PIP3A4

we 'd like to create a canonical document for those different format
but, if we pick up the industry standard as internal standard, it do impact our system’s performance. if we create it by ourself. I cann’t make care it can meet the requirement in future. any suggestion about it? any basic framework to design a canonical document? if you can provide some sample, it will be great

Creating canonical documents is a very challenging task. It can be very difficult to create one that handles all the variations needed and is readily modifiable to accomodate new needs (yet not change such that existing facilities don’t need to be reworked). As evidence of this challenge, just look at how many standards exist, how long they took to create and evolve, and how many people don’t pick one because “it’s more complex than we need.” (Which might really be “it’s more complex than we want to understand.”)

Perhaps these rules of thumb can help:

  • If you adopt a standard, stay with it. Don’t modify it at all, otherwise you’re not following the standard anymore and lose the benefit of having someone else evolve the standard for you.

  • On projects that I’ve been involved with, or have heard information about, those that use an industry standard as a start to their own format have invariably abandoned that effort. Based on this, my advice is to not start down this path at all. You’d be better off creating a new format for yourself.

  • Quite often, it doesn’t matter what the standard format is but that you have one. Once integrations start down the path of using unique formats it is very hard to change course.

  • I wouldn’t worry to much about “system performance” and how the format may impact it. The payoff of a common format will generally far outweigh any possible performance hit. Design the format to best reflect the object it represents–performance doesn’t really need to be a consideration. Follow the 3 rules of optimization: 1) Don’t optimize yet; 2) Don’t optimize yet; and 3) Don’t optimize yet.

  • The major applications within the company have significant influence over how the company does business and how it defines business objects. For example, a lot of work and effort go into defining how a PO, an invoice, a shipment notice, etc. are managed within SAP R/3. It is not unreasonable to use the most significant application as a guide for the format of your canonicals.

  • When designing your own canonical format, provide fields/structures that can hold data you never thought of. This gives the format some adaptive range, allowing integration designers to pass data that is of interest only to one other application and isn’t directly supported by the format. Many industry standards do something along these lines, called extended fields, extrinsics, custom fields, etc. An occassional review of the use of these extensions can identify what changes need to be made to the canonical definition. The caution here is that design/code reviews must enforce the use of defined fields when they exist so that integration developers don’t lazily use the extended fields because it’s “easier.”

  • Lastly, you might consider not using canonicals at all. I know this suggestion is tantamount to heresy but designing good canonicals takes time and effort and committment–from business users as well as IT people. Often, this isn’t a committment that the enterprise is willing to make. If the number of applications that a PO (or an invoice or whatever) is just 2-3 (it usually is), then it’s probably okay to forego canonicals–the payback probably isn’t there. Please don’t misunderstand me–the use of canonicals is a powerful construct in the right environment. This last bullet point is offerred just in case the environment isn’t right.

Be sure to review the “Canonical Strategy” whitepaper in GEAR 6. It has information that will be helpful. The paper makes some points that I would refine:

From page 9
"Perhaps the most common approach is to adopt a hybrid of the two approache

Very nice answer Rob. I would like to second the business validation comment. Since one of the major goals of a common format is to promote loose coupling of interfaces, leave your business validation to your target business adapters. You common document or canonical format is design to protect you from changes in the source and targets.

markg
http://darth.homelinux.net

Hello,
I would like to third the business validation comment. Also the standardization of handling and creating dates is very helpful.

The part about leaving the standard alone is another point that I like, looking at it now from a distance. I have ideas that maybe it would be good to have the various standards wrapped in a canonical that acts like a meta-layer. This layer would house the new locations for data that are not available in the source document. You would then only need a native data extractor for each document (you would make one anyway for a full canonical) and a final extractor on just the meta-layer.

That way, you can have fields that may not exist yet accessible to your new apps, and migrate the extractors down to native layers as they become available. I like pictures, so here is what I see:

A[meta]{source1}
B[meta]{source2}
A -> (extractor1 {meta-extractor}) -> PO{from source}, ACCT{from source}
B -> (extractor2 {meta-extractor}) -> PO{from source}, ACCT{from meta}

This way, you will have less preprocessing and validation. You can also easily deposit the information into a native (legacy) application that would want only the source. All without you un-processing your document.

So you may have a way to get some of the goodness out of a canonical while respecting all the advice of Rob and Mark. Good day.

Yemi Bedu

Rob,

First, great post, thanks for taking the time to provide such a detailed answer! Your work here represents what is best about WM Users. Others would do well to follow your lead.

Do you usually define canonical documents as Integration Server document types or as XML schemas first? What if the canonical will be used by a variety of web services-based interfaces? Are you suggesting that an XML schema only be defined with xs:string data types?

While I agree that having a dominant application such as SAP R/3 in the mix will rightly influence canonical design, I would suggest that designing a canonical for such an application increases the likelihood that it will need a major overhaul should that application be replaced, be acquired by a rival vendor or be otherwise removed from its position of dominance.

I would also offer that it is always best to use logical field names in a canonical and not field or element names that are application-specific such as “fldCustCCExpDte”.

Mark

Mark wrote:
“Do you usually define canonical documents as Integration Server document types or as XML schemas first?”

In the projects I’ve been involved with, XML schemas have only been used when they come from external sources (e.g. cXML). The remainder have always been defined as Broker and IS document types.

Mark wrote:
“What if the canonical will be used by a variety of web services-based interfaces?”
Where web services interfaces have been used the interface has typically been application specific. Thus the target adapter/service translates the canonical format to whatever format the application requires. I have never seen a case where an end-point accepts or generates a canonical format directly.

Mark wrote:
“Are you suggesting that an XML schema only be defined with xs:string data types?”
If that schema is being used to define the canonical, yes. There are definitely pros and cons to such an approach (e.g. timestamps can be better handled using different types).

Mark wrote:
“I would suggest that designing a canonical for such an application increases the likelihood that it will need a major overhaul should that application be replaced.”
Agreed. This is indeed a potential drawback of the approach. An app independent format addresses this, but as mentioned can be quite difficult to do and many times is not committed to (“we really need this done this quarter and we don’t have time to analyze this to death” which is business-speak for I don’t want to do this cuz I don’t see the value). I should have elaborated on what I meant by “a guide for the format.” I didn’t mean to infer that simply adopting the format of a major application is the way to go. Rather, that it may be a better starting point in some cases for defining a canonical than would starting with an industry standard.

Mark wrote:
“…it is always best to use logical field names in a canonical…”
Absolutely!

One of the interesting debates about Enterprise Service Bus architectures is the degree to which transformation services are provided by the bus.

If web services exposed by the ESB can accept canonical formats then the ESB can provide limited transformation. On the other hand, if web services will accept native formats, the ESB must provide more robust transformation services. Failure to properly address transformation in ESB implementations will lead to ESB-enabled point-to-point integrations.

As Mark G. recently pointed out on his blog, some supporters of Java Business Integration standard argue that creation of canonical documents is an old-school, proprietary approach forced upon the unsuspecting masses by evil EAI vendors. While this argument is pretty lame on several fronts, we do need to consider how canonical documents will be used with emerging web services- and ESB-based integration projects.

Mark

Mark wrote:

“If web services exposed by the ESB can accept canonical formats then the ESB can provide limited transformation. On the other hand, if web services will accept native formats, the ESB must provide more robust transformation services.”

It would certainly seem to provide some level of efficiencies if web services work with canonical formats. Invariably there will be applications that cannot be interfaced using web services nor a canonical. Something somewhere will need to do the transformation. The integration platform will usually need to provide this service in some way.

IMO, web services don’t change the landscape that much. Decoupling document formats can be useful in many situations, including in ESBs.

I think the only ones arguing for not having transformation services in ESB implementations are vendors (or open source projects) who can’t provide the capability.

Mark

Agreed. And I would argue that effective transformation services require a certain amount of tooling which most open-source vendors do not and probably will not have. Wiring this stuff together by hand can be tedious as well as difficult to maintain. I believe it also increases the learning curve in an already difficult landscape.

markg
http://darth.homelinux.net

A few more thoughts on Canonical design (great discussion going on here!)
1)Many members have rightly pointed out that validation should be kept out of the canonical.Some times even field formatting (especially Mandatory field restictions) can cause the canonical to be too restrictive!
2)Try not to keep a milestone (especially early or middle of the project) called “Canonical completion” etc ! Canonical design is probably one of the the most iterative components of the integration solution…best to move on when the initial structure of the canonical is done and improve it as the project moves along.
3)If the canonical has to be viewed,shared and built collaboratively with the owners of source /target ESS or external systems,it may be best to define the canonical in a xml schema…the “outsiders” may not have the webMethods tools to view a wM document!

HTH
Raman Rangaswamy
Satyam Computer Services