Handling zipped large xmls

One of the clients would be sending large XML in the form of zipped object over MQ. I need to retrieve certain information from the xml. I was able to read the zipped xml using gZipInputStream. However instead of reading the xml and constructing a string, I want to pass the GZipInputStream to xmlStringToXMLNode. Once I can get the node, i will be able to use NodeIterator services for large XML handling.

Below snapshot of the code has commented the byte-by-byte read, which works absolutely fine. However if I pass gZipInputStream to xmlStringToXMLNode, I don’t get any error, but looks like it returns empty node.

IDataCursor pipelineCursor = pipeline.getCursor();                         
    Object obj = IDataUtil.get( pipelineCursor, "in" );                
    pipelineCursor.destroy();                                                  

    Object    node = null;
    StringBuffer out = new StringBuffer();
    String value = null;
    try {
                                                   
        byte[] byteArray = (byte[])obj;
            GZIPInputStream gZipInputStream = new GZIPInputStream(new ByteArrayInputStream(byteArray));
/*            byte[] strBufInBytes = new byte[1];
            int bytesRead = gZipInputStream.read(strBufInBytes);
            while(bytesRead != -1)
            {
            value = new String(strBufInBytes); 
            out.append(value);
                    bytesRead = gZipInputStream.read(strBufInBytes);
            }
 */           
        IData input = IDataFactory.create();
        IDataCursor inputCursor = input.getCursor();
        IDataUtil.put( inputCursor, "$filestream", gZipInputStream );
        inputCursor.destroy();

// output
        IData     output = IDataFactory.create();
        output = Service.doInvoke( "pub.xml", "xmlStringToXMLNode", input );
        IDataCursor outputCursor = output.getCursor();
        node = IDataUtil.get( outputCursor, "node" );
        outputCursor.destroy();

        gZipInputStream.close();

              } 
        catch(Exception e) {
               throw (ServiceException) e; 
      }
IDataCursor pipelineCursor_1 = pipeline.getCursor();               
IDataUtil.put( pipelineCursor_1, "out", node);                           
pipelineCursor_1.destroy();       
 

Any pointers/thoughts on what might be wrong?

your gZipInputStream is still in zipped form, need to unzip it first before parse as xml.
try to use Inflater first.

We have some concerns related to the object size. If I inflate it, i have to store it in byteArray. I am not fully familiar with how JVM’s handle the byteArrays. If byteArray needs contiguous memory allocation, then we will run into OutOfMemory errors.

xmlStringToXMLNode is supposed to accept InputStream. Can’t it decompress while reading from GZipInputStream? (My understanding was you can ‘pipe’ the input streams the way you want).

Any idea if byteArray needs contiguous memory allocation in IS?

I agree that passing the GZipInputStream input stream to xmlStringToXMLNode should work. What you might try is calling your own FLOW service from the Java service to add in debugging steps to help determine what’s going on. You might also check with wM support to find out how xmlStringToXMLNode reads from the stream. Perhaps it is doing something that is not supported by GZipInputStream (e.g. mark, reset).

Hi Reamon,

Thanks for your valuable suggestion. If I invoke the xmlStringToXMLNode service from java code using Service.doInvoke , it doesn’t give desired result. However using the xmlStringToXMLNode in a flow service can read GZipInputStream.

I don’t know how Service.doInvoke behaves. May be it is creating a different context or separate thread to invoke the service and as a result it is losing a track of stream.

Another change to consider is to have the Java service accept the byte array input (as you have it now) and simply return the GZipInputStream object. A “getInputStream” service, if you will. Then in your top-level FLOW service, call that, then xmlStringToXMLNode, then do your iterating. This avoids using Service.doInvoke altogether and keeps the scope of the Java service small.

For the doInvoke behavior, it does not create a new thread. doThreadInvoke is used for that.

I have done exactly the same. Thanks for the direction!