Handling large documents - how we did it and open questions

fcctr · October 20, 2008, 8:30pm

Hi to all!

On a webMethods IS 7.1, in our company we have the following scenario: We do handling of large EDI message with XML documents as payload, which can be as large as 50MB.
After some try-and-error and with the help of the webMethods documentation and this forum, we were able to convince the IS to handle large files. So I want to give some of the knowledge gained back in a condensed form (still I have some open questions). If someone finds that I missed something or that some things are not necessary, I would be very glad to be told so.

The steps for that we followed for enabling the IS to handle large (EDI) were:

Changing settings on IS for handling large documents

On the web interface (Server Administrator) of the webMethods Integration Server, in Settings|Extended, set the value of watt.server.tspace.location to the absolute path of an existing directory, in which the user running the IS has rights to read/write/create files.
Set also watt.server.tspace.max to “the maximum number of bytes that can be stored at any one time in the hard disk drive space that you defined using the watt.server.tspace.location property”.

Changing Setting for EDI (if EDI is used)

In the solution menu of the navigation panel, click on EDI to choose the EDI Module. There go to the configuration menu and there to the Configure Properties. Add or modify the property “EDIBigDocThreshold” to the value (in bytes) which shall be the treshold for considering EDI document as large: every EDI document larger than the treshold is considered to be large and thus stored to watt.server.tspace.location, from where the processing moves then on.

The actual process for handling the large documents consists (in our case) of two services and looks as follows (I hope that this pseuo-notation is understandable):

>---------- Service 1[INDENT] → wm.tn.doc:ge [/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT] [/INDENT][INDENT][INDENT] [/INDENT][/INDENT] [INDENT] [/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT] [/INDENT][INDENT][INDENT] [/INDENT][/INDENT] [INDENT] [/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT][INDENT] [/INDENT][/INDENT][INDENT] [/INDENT][INDENT] [/INDENT] → Call style="font-family:'Courier New'"> Input to the pipeline: bizdoc (BizDocEnvelope)
tContentPartData[INDENT] Service In:
bizdoc from the input parameter bizdoc of the service
partName set to xmldata
getAs set to stream
Service Out:
partContent
→ pub.xml.xmlStringToXMLNode
Service In:
$filestream mapped from partContent
isXML set to true
Service Out:
node
→ pub.xml:getNodeIterator
Service In:
node mapped from node (Service Out of pub.xml.xmlStringToXMLNode)
criteria set to the name of the XML element(s) that we want to process iteratively
movingWindow set to false
Service Out:
iterator (the XML node iterator that is used in the following for iterating)
to the service (Service 2) that takes the just created NodeIterator as input parameter

→ Closing the Inputstream (see question 1 below)
----------<

>---------- Service 2
…
REPEAT (Properties: Count == -1; Repeat on == SUCCESS)
[INDENT] → pub.xml:getNextXMLNode
[/INDENT][INDENT][INDENT] Service In:
[/INDENT][/INDENT][INDENT][INDENT][INDENT] iterator (from the calling service, i.e. Service 1)
[/INDENT][/INDENT][/INDENT][INDENT][INDENT] Service Out:
[/INDENT][/INDENT][INDENT][INDENT][INDENT] next (“wrapper” for the next XML node)
[/INDENT][/INDENT][/INDENT][INDENT] BRANCH on ‘/next/name’ (Properties: Switch == /next/name; Evaluate Label == False)
[/INDENT][INDENT][INDENT] $null: SEQUENCE (Properties: Label == $null; Exit on == FAILURE)
[/INDENT][/INDENT][INDENT][INDENT][INDENT] EXIT ‘$loop’ (Properties: Exit from == $loop; Signal == SUCCESS)
[/INDENT][/INDENT][/INDENT][INDENT] → pub.xml:xmlNodeToDocument
[/INDENT][INDENT][INDENT] Service In:
[/INDENT][/INDENT][INDENT][INDENT][INDENT] node from the next/node returned by getNextXMLNode
[/INDENT][/INDENT][/INDENT][INDENT][INDENT][INDENT] makeArrays set to false
[/INDENT][/INDENT][/INDENT][INDENT][INDENT][INDENT] documentTypeName set to the type of document that is retrieved by the iterator
[/INDENT][/INDENT][/INDENT][INDENT][INDENT] Service Out:
[/INDENT][/INDENT][INDENT][INDENT][INDENT] document. Is explicitly mapped to a variable representing the document (has to be created as new variable) that is further processed afterwards
[/INDENT][/INDENT][/INDENT][INDENT] (… process the retrieved document)
[/INDENT][INDENT] …
[/INDENT]----------<

Ok. That’s more or less how the services are made up. The services work quite well (already processed some 50MB documents). However, I still have some questions, that I ware not able to answer searching the wmUsers forum or advantage.

1) Releasing open files
If I check with the command “lsof -p PID” on our server (PID is the process id of the webMethods IS), whether the IS still has some files open from the watt.server.tspace.location directory, I see that the payload file is still open by the IS, although the services terminated. How do I release such open files? Does it suffice to simply drop some variables at the end of the service (which ones? /body/stream or $filestream)? Or do I have to use pub.io:close? If I use pub.io:close, what should I map to the Service In Parameter “inputStream” of that service. Is it /body/stream or $filestream?
I tried pub.io:close with the pipeline variable /body/stream (produced by getFile) mapped to the service in variable inputStream. That seems to work (i.e. the processed (large) file is no longer open after the termination of Service 1.
Is this correct?

2) ReservationInitException
Does anyone else already experienced the following error, which imho seems to be related to the large document handling (from the Logs|Error section of the Server Administrator):

wm.EDIINT.util:streamToBytesOrReservation com.wm.app.b2b.server.ServiceException: com.wm.util.tspace.ReservationInitException: File already exists Stack trace data …

The error occured after restarting the server. For me this looks like the server has problems with created new (large document) files in the watt.server.tspace.location. There I have files like 1DocRes.dat, 2DocRes.dat, etc. As far as I understood, the server starts with the counter set to 1 after restart and is thus not able to create the new file 1DocRes.dat, since that already exists. Now, it would be possible to delete all files from watt.server.tspace.location at server restart. Still I find it strange, that the server raises the error that it is not able to write the file, but cannot guess an alternative filename so that it can proceed anyway.

±±±+

Thanks for any help!
Bye,
Christoph

gupta_r.17495 · October 20, 2008, 10:06pm

I would say use the pub.io:close (map the $filestream) and see how it goes.

Did you talk to tech support reg the exception…since they moved the largefile handling configuration to IS settings level from the previous TN properties file starting from IS711 onwards…

HTH,
RMg

fcctr · October 21, 2008, 1:13pm

Hi RMg!
I tried with both, i.e. with $filestream (input of xmlStringToXMLNode) and with /body/stream (output of getFile) mapped to the input variable inputStream of pub.io:close. It seems to work with both an basically both should be “the same”, since /body/stream is mapped to $filestream.

Regarding the exception: I will ask the tech support and keep the forum informed about any reactions.

Have a nice day!

Christoph

gupta_r.17495 · October 21, 2008, 6:45pm

Thanks for the update Chris…do keep this forum posted response from tech support…

fcctr · November 10, 2008, 5:56pm

Hi!

Finally, I got a response from the tech support. Their simple solution proposal was to install the latest fix for the EDI module (in our case this was: WmEDI_6-5-2_Fix12). I did that and that solved the problem. (Although I also searched advantages, I did not find anything regarding this problem - not even in fix descriptions).
However, I find the solution that was implemented in the fix not as clean as it could be: Still the error

wm.EDIINT.util:streamToBytesOrReservation com.wm.app.b2b.server.ServiceException: com.wm.util.tspace.ReservationInitException: File already exists Stack trace data …

appears. And it appears as an error. Imho a warning would be more than enough, because in the case that a file already exists, the server increments the counter until no file with the filename that results from the concatenation of the counter and “DocRes.dat” exists.
E.g.:
If 1DocRes.dat and 2DocRes.dat but no 3DocRes.dat exist in the watt.server.tspace.location directory and the server is restarted (=> counter reset to 1), then two errors appear: one since 1DocRes.dat and the seconrd since 2DocRes.dat already exist. As no 3DocRes.dat exists, everything is fine, when the counter reaches 3, and the new large document is written and handled as 3DocRes.dat

Cheers to everyone,
Christoph

Topic		Replies	Views
Edi 210 EDI	61	9338	April 2, 2021
Large file handling EDI	11	5514	May 14, 2021
Large File Validation EDI	15	3004	April 2, 2021
Read File List Content Performance inefficiency EDI	17	3312	April 2, 2021
convertToValues not returning arrays? EDI	14	2324	April 2, 2021

Handling large documents - how we did it and open questions

Related topics