Handling JSON strings and large files in Integration Server

Description

Previous versions of Integration Server provided support for handling JSON content using built-in services for converting JSON strings and files to IData format and back. With Integration Server, version 10.7, this capability is extended using new built-in services for iterating over and extracting data or data fragments from large JSON files.

In this tutorial, we will look at how to:

  • Create a simple flow service that converts JSON content to an IData document, transforms the IData document, and finally converts the transformed data back to JSON format.

  • Create a flow service that parses a large JSON file containing multiple arrays, retrieves specific array elements, and finally converts them to an IData document.

Note: For information about the built-in JSON services, see the “ JSON Folder ” chapter in webMethods Integration Server Built-In Services Reference.

Downloadable Resources

Custom_Package.zip (13.5 KB)

The package developed in this tutorial is attached for reference. To explore the services and connections between them further, download the zip, install the included Custom_Package in your Integration Server instance, and open the package in Designer. For information on installing a package published by another server, see the Managing Packages chapter in webMethods Integration Server Administrator’s Guide.

image

Transforming JSON content using built-in services

First, let’s consider a simple use case where some JSON content sent to Integration Server needs to be transformed and returned as a JSON string. Let’s assume the JSON content is an array of employees’ first name, last name, and email address which needs to be transformed so that the first name and last name of each employee is combined into a single name.

For this use case, we will create a flow service that:

  • Converts the incoming JSON content to IData document using the pub.json:jsonStringToDocument service.
  • Transforms the IData document to concatenate the first name and last name using the MAP flow step and the pub.string:concat service.
  • Converts the transformed IData document to a JSON using the pub.json:documentToJSONString service.

Note: The flow service that you create in this procedure is available as the Tranform_JSON_Content service in downloadable Custom_Package.

  1. Create an empty flow service. For example, Transform_JSON_Content.
    image

  2. Add the input and output fields that correspond to the incoming JSON string and the returned JSON string respectively.
    image

  3. Add the pub.json:jsonStringToDocument service and perform the following steps:

    1. In the Pipeline view of the service, add the following sample incoming JSON content in the jsonString field.
      {
      "employees": [{
       "fname": "Warren",
      "lname": "Moody",
      "email": "Warren.Moody@abc.com"
      },
      {
      "fname": "Sam",
      "lname": "Brown",
      "email": "Sam.Brown@abc.com"
      },
      {
      "fname": "Tina",
      "lname": "Francis",
      "email": "Tina.Francis@abc.com"
      },
      {
      "fname": "Daisy",
      "lname": "Hall",
      "email": "Daisy.Hall@abc.com"
      },
      {
      "fname": "Erik",
      "lname": "Simon",
      "email": "Erik.Simon@abc.com"
      }
      ]
      }
      
      

      image

      This service gets the JSON content and converts it to a document object (IData format).

    2. Connect the output of this service to inEmps document and drop the other parameters in Pipeline Out.

    Note: Drop unnecessary output parameters throughout the procedure to keep the pipelines clean.

  1. Add a MAP flow step, select the source and target array variables, and create a ForEach mapping between the two variables.
    image

    Note: For information about using MAPs in flow services, see the “ Mapping Data in Flow Services ” chapter in webMethods Service Development Help.

  2. In the ForEach mapping, add the pub.string:concat service. The ForEach loop allows you to consider one array element at a time for transformation.
    image

  3. In the Pipeline view of the MAP, perform the following steps:

    1. Connect the fname input parameter to a string field in the pub.string:concat service and add a space in the value of the other string field. This step adds a space after the first name of the employee.
    2. Connect the output of the pub.string:concat service to the fname output parameter in Pipeline Out.

      image

  4. Add a second MAP and perform the following steps in the Pipeline view:

    1. Connect the fname and lname fields to the string parameters in the pub.string:concat service and map the output of the service to the name parameter. This step concatenates the first name and the last name of the employee into a single name.

      image

    2. Connect the email parameters in the MAP.

      image

  5. Add the pub.json:documentToJSONString service to convert the transformed content in the IData format back to JSON format.

    1. Connect outEmps to document.
    2. Set prettyPrint value to true to format the JSON content for human readability.

      image

  6. Save and run the Transform_JSON_Content flow service. The transformed JSON content in this scenario will look like this:

    image

Notes:

  • Similarly, you can use the pub.json:jsonStreamToDocument service to convert small JSON files into IData format. The only difference is that the pub.json:jsonStreamToDocument service accepts JSON input as stream data.

Parsing large JSON files

The pub.json:jsonStringToDocument and pub.json:jsonStreamToDocument services that we used in the above use case are not suitable for parsing or converting large JSON files as they load the entire JSON content into memory at once, which can cause Integration Server to run out of memory.

Important: A JSON file is considered large or small based on the memory available to the JSON services. The memory available to JSON services depends on the memory allocated to an Integration Server instance and the memory consumed by other services running on that instance. For example, if the maximum memory allocated to Integration Server is 512 MB, even a 100MB JSON file might cause Integration Server to run out of memory if multiple other services are contending for memory at the same time.

Integration Server enables you to overcome this bottleneck by using the pub.client.getArrayIterator and pub.json:getNextBatch services.

For this use case, we will create a flow service that:

  • Streams a large JSON file using the pub.file:getFile service.
  • Creates an iterator object using the pub.json:getArrayIterator service which enables the service to retrieve the required data in batches.
  • Loops and retrieves the required elements in batches by invoking the pub.json:getNextBatch service within a REPEAT flow step.
  • Closes the iteration using the pub:json:closeArrayIterator service after all the required array elements are retrieved.

Note: A small JSON file is used in this use case only for demonstration purposes. However, the same procedure is applicable to larger JSON files.

Procedure

  1. Create an empty flow service. For example, Large_JSON_iterator.

    image

  2. Create a sample JSON file, sample_input1.json with the following content:

     {
      "degree": [
          {
            "branch": [
            "science",
            "arts",
            "commerce"
           ]
         },
        {
        "science": [
          {
            "combination": [
            "physics",
            "chemistry",
            "biology"
          ]
         },
        {
          "combination": [
            "physics",
            "chemistry",
            "botany"
         ]
        },
        {
          "combination": [
            "physics",
            "chemistry",
            "zoology"
            ]
          }
         ]
       }
     ]
    }
    
  3. Add the pub.file:getFile service and perform the following steps:

    1. Enter the path of the JSON file in the filename input parameter.
    2. Set the value of the loadAs input parameter as stream.

      image

      This service gets the JSON file and passes it as input stream to the
      pub.json:getArrayIterator service.

  4. Add the pub.json:getArrayIterator service. This service creates an iterator object required to parse the arrays in the JSON input stream.

    1. Link the stream output parameter of the pub.file:getfile service to the jsonSteam input parameter of the pub.json:getArrayIterator service.

      image

    2. Configure the arrayPaths input parameter based on the array elements that you want to get from the JSON file. For example, to get the branch array and the first combination array element from the science array, add two arrayPaths as shown in the following image.

      image

      Note : The array paths must follow the JSON Pointer Notation RFC6901.

  1. Add the pub.json:getNextBatch service.

    Note: The iterator parameter of the getArrrayIterator service is an obvious input to the getNextBatch service and Designer automatically connects it to the iterator parameter of the getNextBatch service.

    1. Set a value for batchSize. In this example, the batchSize is 2.

      image

    Since the specified array paths may contain numerous array elements, run this service repeatedly until there are no more array elements.

    Note: For information about using flow steps such as REPEAT, BRANCH, LOOP, and EXIT while building flow services, see the Building Flow Services chapter in webMethods Service Development Help.

  1. Add a REPEAT step and set the Repeat on property to SUCCESS. Indent the pub.json:getNextBatch service under REPEAT.
    image

  2. Add a BRANCH step and set the Switch property to iterationStatus/hasNext and the Evaluate labels property to false. This step ensures that the pub.json:getNextBatch service iterates over array paths until all the array elements are retrieved.
    image

  3. Add an EXIT step and indent it under the BRANCH step. Set the Label property to false and the Exit from property to $loop. This step terminates the pub.json:getNextBatch service when iterationStatus/hasNext is false.
    image

  4. Add the pub.json:closeArrayIterator service. This service closes the iterator used by the pub.json:getNextBatch service.
    image

  5. Save and run the flow service.

    The Debugging feature allows you to see the stepwise results of a flow service. For more information, see the Debugging Flow Services chapter in webMethods Service Development Help .

Flow service results

Iteration 1 : The pub.json:getNextBatch service parses the /degree/0/branch array and returns the first 2 elements as the batch size is 2. Since the last element remains, the hasNext property is true.
image

Iteration 2 : Returns the last element in the branch array. The hasNext property remains true as the service is yet to parse /degree/1/science/0/combination.
image

Iteration 3 : Returns the first 2 elements in /degree/1/science/0/combination.
image

Iteration 4 : Returns the last element in /degree/1/science/0/combination, and then the value of hasNext becomes false. The service execution stops.
image

Additional information

In addition to the services covered in this tutorial, the JSON folder also provides the pub.json.schema:validate service that can be used to validate JSON content against a JSON document type.

References

webMethods Integration Server Built-In Services Reference, version 10.7
webMethods Service Development, version 10.7

5 Likes

Great tutorial but it’s missing a step or two at the end where you’d show how to collate all the batches together and send them to a different flow to produce an output.

This doesn’t include the input and output fields of Large_JSON_iterator.

Flow services should not start with a capital letter or contain underscores, from the clean code practises we were advised of.

Transform_JSON_Content > transformJsonContent
Large_JSON_iterator > largeJsonIterator

To answer you question -
“Great tutorial but it’s missing a step or two at the end where you’d show how to collate all the batches together and send them to a different flow to produce an output.”

If we collect all the batches then it defeats the purpose of JSON Iterator, because it should be used when the data is huge and it can not be parsed by existing services. At the end if all the batches are collected then server will run out of memory. If data is small which can be parsed without iterator then it is not recommended to use iterator.

2 Likes

“clean code” is ambiguous, is a YMMV item and is the eye of the beholder. :slight_smile: The specifics of naming conventions are not all that important – the main factor is having a convention. As long as the naming doesn’t violate any restrictions of the tool it doesn’t much matter what it is

Nice Tutorial on the array iterator!

Clearly explained thank you.

Just in case, If JSON is not an array but when using the “pub.json:documentToJSONString”( to convert the huge IData object(nested document structure) into the JSON string) I have observed only partial content is coming(ending with dots “……”)
Output:

How to handle this scenario ?

are there similar functions to pub.json:getArrayIterator or pub.json:getNextBatch in the cloud webmethods.io solution?

In case someone else is looking for this solution, resolved in

Hello Nagendra Prasad,

Yes it is resolved.

Thank you,
Charan