pdf support

Hi, I noticed that PDF is not listed as a supported MIME type. However, I have also noticed that the nonXML indexer works with some small PDF documents. Does the nonXML indexer in fact work for PDF? Are there plans for better support in the future?

Thanks.

Hi,

as mentioned in the readme, pdf is supported.

-----------------
Extracting content and metadata from PDF files (MIME type: application/pdf) is supported. However, in some circumstances, for example if the PDF file contains LZW compressed objects, no
content is extracted.
-----------------

Best regards,
Martin

Martin,

Thanks for the info. I have been unable to get any PDF over 300K to be indexed properly. Is there a list of other circumstances (besides LZW compressed objects) that would prevent content from being extracted properly?

Thanks,
Tom

Hi Tom,

could you send some of your failing pdfs?

Regards,
Martin

Hi,

i

Hi,

we too have problems with a certain PDF-file, the file is processed and some meta-data is extracted, but no content. (Element was empty.)

The file is about 2,5 MB big, but we successfully processed a file twice as large, so it mustn’t be a size problem.

I can not post this file (it’s project confidential) but I will send it to the moderator upon request.

Hallo Andreas,

please send the file to martin.wallmer@softwareag.com. I’ll try, if it works with the latest version of pdf plugin.
As of the supplier of pdf plugin, some problems in this area are fixed.

regards,
Martin

Martin,

When will the latest version of the PDF plugin be available in the Community?

Thanks,
Tom

I also need to know if we will have better support for PDF. My customers are very interested in this kind of functionality.

Saludos

Gabriel

Hi all,

there seem to be thousands of pdf formats out in the world.
please send failing pdf files to
martin.wallmer@softwareag.com
I’ll make a testsuite out of them and send it to the developer of the pdf plugin.

Regards,
Martin