JSON Parsing

I have huge(>30MB) JSON files (hierarchical) that i need to grab and parse them into IData. I grab the JSON files via. REST’ful calls from Amazon S3.

wM 9.5 has got some JSON parsing methods but not the large document handling methods (like we do with XML iterators)… can i get any insights if any body has done it before? Thanks.

May be you can convert to xml and go further -

JSONObject json = new JSONObject(str);
String xml = XML.toString(json)

I’ve to load the entire jSON file to my memory to convert to an XML. It would scale up my idata and memory.

May be in your case Terracotta bigmemory is the best bet for you that can handle large chunks in the In memory?

Hi Srinivas,

How did you handled large json file?

Converting to XML is not the way to go at all. It may seem like it is an easy one-for-one translation but there are several edge cases that will mess things up.

You’ll likely want to leverage an existing library of some sort, like GSON or Jackson, and plug that into the IS environment, creating your own JSON handling package, so that the processing of the data is similar to what is done with XML node iteration.

I did something similar for delimited files, where a file was opened and read in chunks at a time, creating a max list of IS documents/records per the caller’s inputs. E.g. get 10 records at a time. For delimited files, it was pretty straight forward to hook into the built-in flat file services. Doing the same with JSON will be bit more involved as you’ll likely need to deal with nested objects – so how to determine that you have “complete record” can be challenging.

But the open-source libraries may be able to help with that.

Of course the downside of this is incorporating such libraries often has a “balloon” effect where you end up with more stuff than is needed/desired.

1 Like