Chunking of document list

Hi everyone,

We have a requirement to chunk a document list containg about 10000 documents in to a document list of about 150 each so that we can batch insert the document list of 150 each at a time .Can anyone suggest the approach to do this , Java service would help this to avoid performance issues.

Thanks

Nithin

Hi Nithin,
What is stopping you from doing batch insert for all 10000 records?
What will you achieving by splitting it into multiple lists and then batch inserting?

Cheers
Guna

Hi Nithin,

Hi Nithin,
you can use either flow service ( with loops ) or java service .But I prefer Java service . Javaservice execution will be always fast compare to flow service .

As might be expected by long-time forum participants, I would suggest not worrying about doing this in Java just yet. :slight_smile:

How is this 10000 element document being submitted to IS? It sounds like you’ve already converted it to an IS document, containing a document list of 10000 entries. That may be an issue right from the start, though without detail about the size of the overall document and each entry it’s hard to say.

Here are a few thoughts that may be helpful:

  • Don’t load the bytes/string of the overall document into memory all at once. This means don’t use bytesToString, xmlStringToNode (where you pass a complete byte array or string), or xmlNodeToDocument. If the bytes are not already written to disk, you might consider doing so and then processing the file using an InputStream.

  • Consider using the node iterator facilities. This will keep the memory footprint of the source document small. For each entry, you’ll get a node representing that document which you can convert to an IS document, map and save/send/store/whatever. Just be sure not to create a huge target document with all 10000 entries.

  • With the node iterator you can control how many you build up in a batch–create 150 target documents then save/send/write to DB/whatever.

These items can be done without resorting to Java (though with the first one you might need a small bit of Java code to write the file, depending on your environment).

HTH

Hi Reamon,

I have a similar req where I get a broker doc, which contains doc list.
Now, I need to create xmlstring which has 50 records each. If the broker doc contains 100 or greater. How should I approach.

Could you please tell the steps in detail. Thanks for your help.

Regards,
David.

Hi David,
I understand you have a document with array of records. For your scenario

  1. get the size of list with “pub.list:sizeOfList”
  2. If size < 100 then no need to split
  3. Else if size >= 100,
    –then loop over the document list;
    ----append each record to a separate list (new list)
    ----when the iteration count = multiples of 50, convert the new list into xml
    ----append the xml to a final string list
    –End Loop

I prefer a piece of java code to split the document into small chunks of required size and then converting into xml.

Hope this helps!
DC