Need to read the data in PDF

we are receiving PDF docucment to webMethods, generated from a document generation system called Scriptura. we have a requirment to read the data from the PDf document and process it. I Know that we can sedn the pdf as attachement . BUt my main concern is to read it through webMethods and process the data in it…urgent help is required in it…

For this urgent need, can you indicate what research you’ve done so far so we don’t cover material you already know? Can you be a bit more specific on the need to read and process the PDF? What will be the nature of the data in the PDF? What exactly do you need to do?

Extracting data from an unstructured document can be very error-prone. Imagine processing a Word document, extracting order details from it–one little change can throw things off completely.

Hi reamon,
Basically, we have a document generation system named Scriptura, which takes input as xml and generates documents in pdf using builtin templates. We get the Location of the pdf and webMethods need to pick it and do SMTP. what we are thinking is that if there is a way to exract the data from document and send that data in the body of the SMTP rather than as attachment. The pdf contains data about the payment, brouchers etc…

I imagine that the PDF content could be anything, include images and such. I’d suggest just sending the PDF untouched as an attachment.

But if you want get the text out of the PDF, there are a couple of PDF libraries that you could leverage. Do a search of the web. You’ll need to hook them into IS by writing a Java service or two (or ten) to get the text data extracted. You might check Advantage to see if they may already have a sample that can be a starting point.

Thank you for the Help…

Hi all,

Did anyone got any successfull implementation of the above requirement.
I also need to read the data from the PDF.
Is Itext API working as expected.
Do u have any better suggestion.

I have used iText jars as well as Apache PDFBox jars to read\write pdf docs in webMethods Java services. I personally prefer PDFBox because of the extensive documentation and Cookbook available out in public domain for developers. Using this software would mean that you have to generate the PDF’s by coding(Drawing) the coordinates on to the Canvas manually. There will be little scope for reuse of code unless you have standardized templates for each PDF doc type. There is no licensing fee involved here but you need developers with strong skill set in Java and wM to maintain this code.

If you are dealing with Large size PDF documents or documents with images(Check images\Fillable Forms\Signatures etc) then i would recommend to get “Adobe® PDF Library software development kit” or Adobe LiveCycle server. These are license products from Adobe and will meet any kind of requirements at an enterprise level to deal with PDF’s.