difference flow step and transfomer

Hi,
can i know the difference between a flow step and a transformer.

What has made you curious about these two things? What did you find in the Developer’s Guide on this topic?

M

It is interesting that this topic should come up. I was having an argument with a colleague yesterday about what to use… A Transformer or a direct invoke and I did some research and here are the my finidngs.

I have mixed feelings about using transformers…but what I found out suggests that Transformers are actually good to use rather than direct invokes.

What do you guys think?
Transformer VS Invoke.doc (25 KB)

“Transformers run on a separate thread.”

Do you have a reference for this? I’ve looked everywhere over the years and have never found solid information about whether or not this is true. I suspect that transformers are not run in their own thread. The docs state “developer’s should assume concurrent execution” but assuming and actually doing are different things. I’d love to know for sure one way or the other.

The behavior of debugging leads me to believe transformers are executed serially. It doesn’t make sense to have a run-time behavior and a debug-time behavior that are different, but I may be wrong.

The information about copying the inputs by value is outdated. Integration Server 6.x changed the behavior of transformers and the interaction with the pipeline. According to information in “GEAR_6_Performance_Tuning_White_Paper.pdf” inputs to a transformer are copied by reference from the pipeline. Previously items were cloned and thus could have significant performance implications. Thus, the comments about the tranformer not impacting parent variables isn’t accurate anymore. The only exception is for string vars since those are always copy by value.

IMO, use of transformers is best left to what they were intended–perform value transformations when mapping from one document to another. I sometimes use them as a lazy-man’s pipeline management technique but that’s probably not a good argument for or against them.

This is an interesting topic.

If the bottom line question is - which one is better in terms of performance and best practice ? My answer is none or they both are status quo. Most Developer’s (atleast I do), prefer flow step over transformers for readibility reason. Only exception is when I have a need to transform multiple documents elements in one step.

However, “copy by reference” is always memory and time effective than “copy by value” atleast during run-time. Unfortunately, we all are guilty of using using “value” more often (may be I should speak for myself).

The concept of new thread is bogus, you can prove this by creating 2 services - one with flow step and other with transformer and add some delay. You won’t see the service threads doubling on the statistics page.

When mapping vars, including from a pipeline var to a service input var and a service output var to a pipeline var, strings are copy by value and everything else is copy by reference. If one wants to copy by value, the var must be cloned (using the Java method exposed by the IS Java API).

Transformers are less “readable”. To see the input and output mappings, one must expand the transformer. This can make troubleshooting and maintenance (slightly) more difficult.

Transformers are more convenient for pipeline management. The transformer uses its own pipeline, which relieves the need to drop any unwanted output and protects the caller from pipeline litter. On the other hand, inputs and outputs must be explicitly mapped–no implicit variable mapping is possible.

Transformers can be more difficult to work with when they run into errors. getLastError will not work as one might expect if the transformer throws an exception.

Multiple transformers can be used in a single map step. This makes mapping one document to another with relatively minor transforms easier. Depending on the transformers, readability is actually improved in this case.

Whether or not calling a service with a FLOW step or in a transformer makes a performance difference or not should be (mostly) immaterial to the choice of how the service is invoked. Deciding to use one or the other ONLY because of performance is ill-advised. One should factor in other characteristics (readability, maintainability, etc) as well. Performance considerations should be evaluated in conjunction with profiling measurements. It is not safe to assume that one construct will be faster than another–too often intuition is proven wrong.

Agree

Excellent observation.

I don’t understand. Here is why ?! I created two flow services - The pipeline structure is almost identical. What am I missing here ?

<?xml version="1.0" encoding="UTF-8"?> transformer.xml 10 20 30 50 5 30 -20 600 ----------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> flowStep.xml -10 80 10 20 30 50

I’m not sure of how it is managed precisely. My guess is that a new pipeline var is created, and references are copied from the “main” pipeline to the transformer pipeline. Since the references point to the same underlying object, they resolve exactly the same way. Thus, Doc1 looks exactly the same in the main pipeline and the transformer pipeline.

The main difference is that if you have 100 other vars in the main pipeline, the transformer doesn’t see them.

Consider when you invoke a service using Java. You create a pipeline object, populate it with vars and pass the pipeline object to the service (more or less). This is probably what the run-time is doing under the covers. Create a pipeline object. Copy vars from the main pipeline as indicated by the explicit maps. Run the service passing the smaller/restricted pipeline. On return, copy the transform pipeline vars to the main pipeline using the explicit output maps.

Not sure if this makes things any clearer but hopefully so!