IDataUtilput method performance

Hi there,

Did anyone experienced performance problems setting pipeline with java custom service output ? especially using IDataUtil.put(cursor, key, value) or IDataCursor.insertAfter method ?

I do where value is a wide Object array. Weird thing is that setting that array (simple loop on object array) in the same java service is fast where setting pipeline with that Object array takes hours.

Any clue (even pure java advice) would help …


can u post your code for review?

I think this is what is happening:

At each step when running a Flow a shallow clone of the pipeline is performed in order to support a shallow rollback on service exceptions.

This seemed like a good idea 6 years ago when Flow was designed, but in my experience it has two bad side effects (if only we knew then…).

  1. Clones happen all the time. Slowing down Flow execution in a way that is linear with the size of the top level of the pipeline. Successful execution is penalized to prepare for failures that happen infrequently.

  2. The amount of recovery that you get from a shallow pipeline rollback is not useful and is actually sometimes a pain to work around.

It would sure be nice if there was a property to turn off this clone/rollback behavior for all new Flows. This feature exists in the Flow 2.0 engine. Ask your sales rep for it by name and someday it will get into a release.

What I think is happening in this case is that the next step in the Flow is doing a pipeline shallow clone of the Byte[150000], which makes the JVM garbage collect and collect and collect.

To test this theory, try putting the output into a nested IData and keep the top level elements in the pipeline to a minimum.



I think this has less to do with Flow or Java and more to do with the communication with the client or Developer. Returning really big objects has more overhead. In Developer’s case, the tool needs marshall the pipeline into a viewable tree. This takes a lot longer when there is a significant amount of data.

I tried your code in my environment and noted the long wait for Developer. However, when I added logging to the code, I noted that the service exited long before I got a result back in Developer.

I tested my theory by pushing the output of your service into another Java service that simply calls hashCode() on every object in the input array, adds to an output int, and outputs the result as a string. In the controlling flow, I carefully dropped the object array after its use as input to ensure that Developer wouldn’t try to return the large array object. This worked returned a result much faster even though the IS had to do more work.

Make sure that you are managing your pipeline well, and place some logging in the Java service, especially on entry and exit, to see when it’s really finished.

– Ted

Fred - I’m sure we are mapping in excess of 150k without such huge performance hits. From your explanation of the shallow cloning behavior - it sounds like this would be the case independent of the IDataUtil.put method - in others words - for any flow map step as well?

We have cases mapping 1-2 MB strings without requiring excessive gc.

Perhaps the server needs more memory allocated on start-up?


Thanx a lot for all your responses.
First of all I could not find a way to make it go really faster inserting the Byte array into an IData or trying to cast it as a java.lang.Object expecting that IS would not see it as an array. I noticed that removing input byte from the pipeline in that service helped having less “horrible” performance though.

Then I tried to measure what does the curve of processing time look like changing array size :

long measure = System.currentTimeMillis();
IDataUtil.put(idcPipeline, “bytesArray”, Content);
measure = System.currentTimeMillis()-measure;
IDataUtil.put(idcPipeline, “measure”, “” + measure);

But my pipeline showed measure=0 each time. At that point I must admit that for a few seconds I started doubting the System.currentTimeMillis method … but as a matter of fact the IDataUtil.put method itself executes in less than a millisecond.

From there I assumed that waste of time occurs when exiting my java service probably due to internal pipeline management or whatever. Thus I tried to measure it differently calling service before and after my java service. Same story, date2-date1 was less than a second but my flow was still very long to terminate.

Obviously, Ted is right saying “the service exited long before I got a result back in Developer” (same when calling service via the IS web console). But unfortunatly, I do need that array as output for further flow step. I may improve perf by removing that huge array ASAP in my flow but that’s it.

Thnx again. Still searchin …


P.S :
BTW for those who need to publish document from IS 4.6 to ES 5 via B2B bridge here are some tips :

map IS record field to ES document field :
Byte -> byte
Byte -> byte
Double -> double
java.utilData -> BrokerDate

Weird one : if you call an IS service from the B2B adapter from a document containing a byte field you will get a String in the IS pipeline. To retreive your original byte array, you will need to call String.getBytes …