Hi, I noticed that PDF is not listed as a supported MIME type. However, I have also noticed that the nonXML indexer works with some small PDF documents. Does the nonXML indexer in fact work for PDF? Are there plans for better support in the future?
Thanks.
Hi,
as mentioned in the readme, pdf is supported.
-----------------
Extracting content and metadata from PDF files (MIME type: application/pdf) is supported. However, in some circumstances, for example if the PDF file contains LZW compressed objects, no
content is extracted.
-----------------
Best regards,
Martin
Martin,
Thanks for the info. I have been unable to get any PDF over 300K to be indexed properly. Is there a list of other circumstances (besides LZW compressed objects) that would prevent content from being extracted properly?
Thanks,
Tom
Hi Tom,
could you send some of your failing pdfs?
Regards,
Martin
Hi,
i
Hi,
we too have problems with a certain PDF-file, the file is processed and some meta-data is extracted, but no content. (Element was empty.)
The file is about 2,5 MB big, but we successfully processed a file twice as large, so it mustn’t be a size problem.
I can not post this file (it’s project confidential) but I will send it to the moderator upon request.
Hallo Andreas,
please send the file to martin.wallmer@softwareag.com. I’ll try, if it works with the latest version of pdf plugin.
As of the supplier of pdf plugin, some problems in this area are fixed.
regards,
Martin
Martin,
When will the latest version of the PDF plugin be available in the Community?
Thanks,
Tom
I also need to know if we will have better support for PDF. My customers are very interested in this kind of functionality.
Saludos
Gabriel
Hi all,
there seem to be thousands of pdf formats out in the world.
please send failing pdf files to
martin.wallmer@softwareag.com
I’ll make a testsuite out of them and send it to the developer of the pdf plugin.
Regards,
Martin