Integrate OCR solutions with webMethods.io Integration and automate text extraction from images

Overview

This Monday OpenAI announced that soon ChatGPT Plus and Enterprise subscribers will be able to upload an image that ChatGPT can identify and “read”. I was excited but soon realized OpenAI’s API is not included, so we cannot integrate that new power with our applications. I believe this feature is very powerful as often people will use screenshots when they have an issue or try to show an image instead of having to explain it.

So, I decided I need to create a webMethods.io Integration workflow that leverages the power of a 3rd party Optical Character Recognition (OCR) tool or API that recognizes text within a digital image. It turned out to be really easy, and here’s a simple tutorial.

You will need:

  • a webMethods.io Integration tenant - sign up for a free forever trial if you’re new
  • an OCR tool account that has an API that you can access

Set up an OCR tool account

There are multiple OCR applications available that can help you analyze images. After you choose one that best fits your requirements, create an account, and generate an API key. In this article, we will use the OCRSpace tool. You can register only with your email address and shortly after that, you will receive an email with your API key.

Integrate with webMethods.io Integration

The example below shows how you can set up a webMethods.io Integration workflow that automates the process of extracting text from images and storing the text in a file.

Let’s say you receive an array with one or more image URLs as a response from either a trigger, action, or application connector in your webMethods.io Integration workflow and you want to convert the images into text. In this example, the URLs are received as an output from an HTTP call action:

  1. Start by adding to the canvas with the drag and drop feature the Write a file action, this will automatically connect it to the previous step. Hover over the action and click the gear icon, provide a file name, click Next, Test (this will create the file), and Done:

  1. To convert each of the images into text, we will use the Loop action. Search for it and drag and drop it to the canvas:

Hover over it and click the gear icon. From the Incoming data panel find the array with all URLs and connect it with the Source Array/Object string, select Each Item as a Select Loop Type, click Next, Test, and Done:

  1. Now double-click on the Loop action, this will open the Loop canvas. Add the HTTP request action, double-click on it, and set up the API request to the OCR tool. For this example, the setup looks like this:

HTTP method: GET
URL: https://api.ocr.space/parse/imageurl (the OCR tool API endpoint)
URL Params:
Key: apikey
Value: {key} (insert here your API key)

Key: url
Value: https:{{$a8.currentValue[1]}} (from the Incoming data panel choose the CurrentValue result from the Loop action).

Click Next, Test, and Done.

  1. Add the File Append action to the Loop canvas, this will automatically connect it to the previous HTTP request action. Double-click on it, from the Incoming data panel add the file path into the File Path field and the ParsedText value from the previous step into the Data field:

Click Next, Test, and Done. Connect the File Append action to the Stop step and close the Loop canvas:

  1. The last step is to read the text from the file. Add the Read file action to the canvas and pass the file path in the File Path field:

Click Next, Test, and Done. Connect the last step to the Stop step and save the workflow:

You can now run the workflow using the play button in the right upper corner and review the result. At the bottom-left corner of the screen you can find the Execution history of the workflow:

From here you can monitor the execution status, check the result, and view error details if needed:

Now you can continue to work on your use case using the text result in the next step of your workflow – for example, include it in a prompt in the ChatGPT connector, etc.

1 Like