How to handle a binary file in a webmethods REST resource?

I’m using webMethods integration server 10.11.

I want to expose an URL where an external application can send me a zip file and save it somewhere.

I create a Rest resource accessible to a POST url.

When I call this URL with postman with a binary file attached I don’t understand how I can retrieve the content of the body as a stream or as a byte array.

I also have some query parameters in my request and I recieve them in variables but nothing is visible for the body when I save the pipeline by example.

Is there some webMethods “magic” similar as using the query param “jsonFormat=stream” to get a jsonStream input but for binary content ?

Thanks for your help

I would strongly suggest to NOT create a so-called REST resource. Don’t create a RAD component.

Instead, just use the same ol’ HTTP capabilities that have been available long before REST became an ill-defined and overly abused term. :slight_smile:

You can use the “legacy” REST support to expose a path that is relatively straight-forward to implement. Review the docs on that.

Or you can define a service in any package/folder you want and have that process the HTTP POST. Review the docs for calling a service with the invoke directive in the URL.

In either case, a key part is the Content-Type that client uses to post the file. Different Content-Type values are handled differently by the content handlers in Integration Server. The name of the variable in the pipeline will differ depending on the Content-Type. There are docs about this too.

I would strongly suggest that you do not configure this to expect/process a byte array. That will try to load the entire file into memory, which if someone posts a “big” file can crash your JVM. Accept/process a stream. How to set this up depends on the Content-Type.

Lastly, this may fit into multi-part form – you indicated URL parameters along with a file. Forms support doing this. You could use the MIME services to do this.

Hope these options help!

In addition to Rob’s post, this looks like a case for an ftp/MFT server. Uploading a file directly is usually not a good idea.

Do not do this for a zip or any other binary file. Or any file that is “too big” – a file that can exhaust the JVM heap.

Whether or not contentStream is a variable in the pipeline depends upon the Content-Type in the HTTP header. It can vary.

Don’t need to call getTransportInfo service to get the URL query parameters. They will be in the pipeline when IS invokes the service. Just need to declare them as inputs so they show up at design time.

As covered in the other threads, use variable substitution conservatively. In most cases, it is not necessary to use it.

1 Like

This is an important thing: be pessimistic about what comes your code’s way.

To give people an idea: In a “robust” implementation the actual business logic usually makes about 20-30% of the code. The rest is checking preconditions, handling errors/exceptions, and logging.

If you have those docs please share

Refer to the IS Administrator’s Guide. Look for “content handlers”.

If loading huge files is a must (the actual requirement not the solution attempt), we should also provide the solution for it I guess. The proper solution for processing huge files is to use off heap memory, using terracotta and MFT together. It should be easier if the big file is an xml. If not sure on how or why to do like this, try to find another solution. You don’t have to use MFT, you can use any ftp server and utilize it the same way. MFT is webMethods FTP solution (roughly). Its better to use it if you have it.

if you want to proceed without getTransportInfo , then we need to enable few extended settings like “watt.server.http.forwardHeaders”

The OP never mentioned HTTP headers. Just query parameters. Don’t need to get those via getTransportInfo.

That extended setting does not apply in this case. There is nothing to forward.

Iam not sure what is restricting my server, Iam not able to capture transport headers or query parameters using rest v2 resource or invoke/servicename without using get transportinfo

HTTP headers are available only via getTransportInfo. But as noted, the OP did not mention a need to access those, so I’m not sure why those came up.

URL query parameters will be in the pipeline, though RAD v2 components may restrict or do something a bit different. If that’s an issue, perhaps you can open a separate topic in the forums to explore this.

To achieve file upload in streaming mode (without having to load the file content into memory), you need

  • an array of bytes in input of the flow service
  • then pub.io.bytesToStream to get the stream you need from the bytes array
  • and finally you can use pub.file.streamToFile to write your file locally (fileAccessControl.cnf also needs to be updated to allow the file to be written in your desired location)

In you want to use a RAD, here’s an example of OpenAPI v3 specification that you can use to generate your implementation skeleton. You need to use “application/octet-stream” as the content type to get the expected bytes array in input of the flow service.

openapi: 3.0.0
info:
  title: File Management API
  version: 1.0.0
servers:
- url: http://localhost:5555/FileManagementAPI
paths:
  /upload:
    post:
      summary: Upload File in Binary Format
      operationId: uploadFileBinary
      parameters:
      - name: X-Filename
        in: header
        description: The name of the file being uploaded.
        required: true
        style: simple
        explode: false
        schema:
          type: string
      requestBody:
        content:
          application/octet-stream:
            schema:
              type: string
              format: binary
        required: true
      responses:
        "200":
          description: File Uploaded Successfully
        "400":
          description: Invalid Input
        "500":
          description: Internal Server Error
1 Like

This is actually the approach I would always recommend, if the file size is not limited.

Using off-heap memory will of course allow to handle much bigger files than not doing so. But what if 100 files with Base64 encoding come your way in parallel?

If fully loaded into memory, a file will typically need 4-5 times as much space as on disk. With a 2 GB file that would roughly mean:

100 files * 2 GB/file * 4 = 800 GB

And that doesn’t even take into account duplication during processing of the file.

If I may circle back to my previous comment about being pessimistic: Unless you can technically limit the maximum file size and the number of parallel sessions, do assume that your memory will not be sufficient. And don’t rely on people telling you that their system will never send files so big.

The easiest way to achieve this kind of control, as @engin_arlak mentioned, is to let an (S)FTP sever or MFT solution handle the file transfer. It also has the added benefit of isolating systems and by that making operations easier for IS downtime or packages updates.

If someone is interested in a good read on such topics I can recommend “Release It!” from Michael Nygard very highly.

Yes, If the big file needs to be processed, it should never come through http calls. If it is not going to be processed, I would also stick with Stpehane’s flow.

This is counter to the approach of “don’t load the entire file into memory.”

Placing an input stream in front of a byte array is too late – the entire file has already been loaded.

The input var to the service should be defined as a stream object (just for declaration). The key is getting the right var name used that matches what the content handlers do for each possible Content-Type that is to be supported.

The stream objects give access to the data stream managed by the HTTP/content handler components. You’ll be reading the bytes “from the wire.” Use the usual loop to read a buffer size (1K or 8K or whatever makes sense) from the network and write it to a file. In one of our cases (a basic HTTP proxy with accepts calls and forwards them on to another app) we simply map the input stream from the caller to the data/stream. Works great.

Be aware that using “X-nnn” headers is something that should generally be avoided. By definition, it is non-standard. And the general consensus is that the X- headers should be avoided.

For file uploads, the usual convention is to use multi-part form.

Why not? We do it all the time. MFT systems offer browser-based UIs and accept arbitrary file sizes via HTTP without issue. Using streaming, and never byte arrays (except as intermediate buffers) is the key. Not only for IS but any runtime environment. HTTP is not really the problem. It is the tendency of IS to lead developers to load everything completely into memory all the time. Techniques exist to avoid that (though not always.)

Don’t cut the if part. If you need to process it, there needs to be a buffer to keep these files. Otherwise it can even fill the off-heap memory. You can get the file from http call and save that before loading it to memory, then process it. Then it won’t be coming from http calls will it? It will come from an ftp server or from the disc. If big files needs to be loaded in to memory, the process should be asynchronous.

My mistake. But my question/point still stands.

Whether IS is going to “process” the data or not, there is no reason per se to avoid supporting large data via HTTP. Depending upon the data type and what processing is needed, “big” data sets can also be processed. Node iteration, as one example. Another example is the flat file services – can read a record at a time from an HTTP stream, not just from a file. Read a record (or 10), write them (to the target, via API, DB, file, whatever.). Loop until end of stream.

Even if one writes it to disk first, it should still not be completely loaded into memory after that. Should still use streaming to process – there is no difference between reading the stream from HTTP than reading as a stream from a file.

There is nothing that requires that the entire data set (file or otherwise) be loaded to memory or off-heap. Or fully read from HTTP and written to disk. Alas, the docs, services, examples, etc. for creating IS services tends to lead developers to load everything into memory. Using streaming techniques and node iteration takes more research/effort but it can be/has been done.

Using a buffer for processing large files is my recommendation, that’s it. It is not because of product capabilities, rather it is because of unpredictable runtime conditions. It is possible to receive a lot of these requests at the same time and blow something up in the server. The example you provide is an exception. We don’t need to load all of the file in to the memory, we just partially scan it. Most of the large processes require you to load them in to memory, hence its better to use a buffer somewhere, queue them if necessary.

Hello and thanks for all your answers.

I finally achieved to do what I wanted, thanks to

Refer to the IS Administrator’s Guide. Look for “content handlers”.

According to this, I had to use Content-Type: multipart/form-data to get a contentStream.

Then in order to get the file content I use:

  • pub.mime:createMimeData using contentStream as input variable to get mimeData
  • pub.mime:getBodyPartContent using mimeData as input variable and index = 0 (I only recieved one file) to get the file content stream

Then I can process the file stream to do my business logic.

I also use a REST V2 Resource component to simply map a specific URL to the processing flow service and group this call with some others more regular json REST calls.

Best regards,
Vincent Migot