Large Document Handling

Guest · June 17, 2003, 1:07am

I am implementing large document handling and can’t acquire the document data. I’ve sending to a service that calls wm.tn.doc:recognize. I can see the bizdocenvelope data however content is null - which I might(?) expect if the document is written to disk. I then pass the bizdocenvelope to the bizdoc parameter of getContentPartData (as outlined in wM documentation) with Partname=xmldata and getAs=stream. However, I get an EXMLException at this point. How do I get a reference to this content so that I can extract it from disk?

mrajanet · June 17, 2003, 2:46am

Couple of checkpoints

Have you changed the TN config parameters as mentioned in the TN Large document.
Using recognize and routeBizdoc service will persist the TN document.
In TN Console, look into the persisted document by doubleclicking, and then you will see ‘Storage Type’ and ‘Storage Reference’ parameters filled in the Content Tab indicating it is a large document.
Once this is confirmed, use getContentPartData service and provide the partname as you see it listed in TN console->Content window of the document. Btw, the content variable in Bizdocenvelope will be null for a large document. You have mentioned partname as ‘xmldata’. Please verify that by looking into TN console. It maybe different. ‘Stream’ value is correct.

Afer all this, if it still doesnt work, please post the error message so that we may get more hints.

Guest · June 17, 2003, 11:40pm

Yes.
Check.
Check.
Check. getContentPartData still throws this error:

com.wm.app.tn.err.EXMLException: <exmlexception>
<errorcode></errorcode>
<info>wm.tn.doc:getContentPartData</info>
<originalexception>
<javaclass>com.wm.app.tn.err.EXMLException</javaclass>
<message><exmlexception>
<errorcode>TRNSERV.000026.000003</errorcode>

mrajanet · June 18, 2003, 1:38am

Looks like the ContentParts field which has all the pointers for the file is missing. Look into the Bizdoc in the pipeline to check if the field bizdoc/ContentParts is not empty(or null). If that is true, then getContentPartData will work.

Guest · June 23, 2003, 8:39pm

Thanks. I went back over the documentation and there is a typo in the large doc handling pdf. Directions reference both a tn.tspace.loacation and a tn.space.location (no t). I had referenced the latter initially. Made the change and it works now.

Guest · June 23, 2003, 9:56pm

One additional question. Does anybody know what/if garbage collection service is used? Documentation says that document content remains on harddisk drive space until it’s no longer being referenced and a gargabe collection routine removes the document. Just wondering what type of interval (or if there is one) that this garbage collection service runs.

Guest · June 24, 2003, 6:06pm

Found the answer on advantage if anybody’s interested.
[url=“http://advantage.webmethods.com/article?id=1610979279”]http://advantage.webmethods.com/article?id=1610979279[/url]

vinod.ravi.20798 · June 24, 2003, 6:19pm

Of course! we are interested.
Thanks for the information.

Guest · June 24, 2003, 6:49pm

Hi Brian/Other webM Gurus
What do you do once you retrieve the large doc from the “tspace” using
wm.tn.doc:getContentPartData service??

I mean do you do the following?
getContentPartData (which returns partContent)
stringToDocument (which returns a node object)
getNodeIterator ( map the node object to the input of getNodeIterator)

We are doing the above steps and we get an java.lang.OutOfMemory exception.

Any suggestions are appreciated.

Thanks
Sathish.

Guest · June 24, 2003, 8:01pm

supply ‘stream’ for the getAs parameter of getContentPartData. I think I’ve seen on other wmusers posts that folks will loop through 5Mb chunks if docs are extremely large to avoid an OutOfMemory error, but I haven’t made it that far.

Guest · June 24, 2003, 8:04pm

[url=“http://www.wmusers.com/wmusers/messages/1825/890.shtml”]wmusers.com Mentions this.

mrajanet · June 24, 2003, 8:10pm

You use the getContentPartData service to retrieve the large document as a stream object.
Once you have this, you can use any node iterator service like converToValues. This service for example will step through your file based on the toplevel nodes you have defined in your flatfile schema. This way, you avoid yourself from reading the entire stream and getting an OutOfMemory error.

Guest · June 24, 2003, 8:25pm

Hi Bryan
What are you doing after the getContentPartData service??
It returns a ‘partContent’ stream object.
How are you processing it??

Thanks
S.

Guest · June 24, 2003, 8:28pm

Manohar-
We have a scenerio where we need to write the content to file. So I’ve written a Java service to accept partContent (stream) and use it to create a DataInputStream. I then use this to supply to a file output stream and loop over in 1 Mb increments. I think this is analogous to what you are saying - I think our end objective is different so our means are unique to this. One thing I did notice was that the file balloons in size, though content appears to be the same. I think it’s b/c of the carriage returns. I may be using the wrong io object to write the data - still working on this. Thanks.

Guest · June 24, 2003, 8:36pm

Sathish-
If you have an the WmTNSamples package there is another example in the wm.tn.samples.largedoc:NodeIterator service. I think this package is distributed with TN installation or upgrades.

Guest · June 24, 2003, 8:40pm

My next big question is about the xml version tag. This requirement is a huge obstacle in implementing large doc handling - page 12 or 13 of pdf states that (and I’ve also verified in testing) any doc specified as large must have the xml-version tag as the first tag. We are part of an online business exchange (for which we aren’t the hub) and this could be a huge deal implementing at the hub as not every partner uses wM and it’s difficult for this change to be implented exchange-wide. Do you know if wM has since provided a way around this requirement.

Guest · June 24, 2003, 9:43pm

Hi Manohar
Where is the convertToValues service? I looked for it in ISBuiltinServices.pdf but couldn’t find it.
Thanks
Sathish.

mrajanet · June 24, 2003, 10:21pm

There are 2 convertToValues service one each in the WMFlatFiles and WMEDI package. Depending on which package your company has bought, you will have these services available. These are the 2 services which typically handle large flatfiles to parse the incoming file into Idata.

Guest · June 24, 2003, 10:49pm

Manohar
Thanks for the response. But WmFlatFile package is only available in webM 6.01.

Currently I have the following flow:

getContentPartData (which returns partContent)
stringToDocument (convert partContent to a Node object)
getNodeIterator ( map the Node object to the input of getNodeIterator)

My understanding is: stringToDocument is not resource intensive. but documentToRecord is resource intensive because it involves XML parsing.

We are using webM IS 4.6. So do you think this is a good approach to use stringToDocument to convert “partContent” (Stream object) to “*node” (Node object)??

Thanks
Sathish.

mrajanet · June 24, 2003, 11:50pm

Satish - Your approach is right. Ideally, we would like to use documentToRecord so that in one iteration we read the entire document into IData record. However this is not practical for a large document. Hence we use a node object of the incoming stream and an appropriate iterators specified by some repeating xml nodes.
The 2 service to read the stream using iterators will be
getNodeIterator and getNextNode.

Topic		Replies	Views
Large File Validation EDI	15	3004	April 2, 2021
How to access ISA when processing Transaction content EDI	9	1578	April 2, 2021
Large EDI TN Processing EDI	4	746	April 2, 2021
Questions on recognizing EDI doc for inbound EDI	37	7778	April 2, 2021
Persist Group Envelope in TN EDI	13	2473	April 2, 2021

Large Document Handling

Related topics