pdf support

Tom_Michaud · July 11, 2003, 2:53am

Hi, I noticed that PDF is not listed as a supported MIME type. However, I have also noticed that the nonXML indexer works with some small PDF documents. Does the nonXML indexer in fact work for PDF? Are there plans for better support in the future?

Thanks.

Guest · July 11, 2003, 12:28pm

Hi,

as mentioned in the readme, pdf is supported.

-----------------
Extracting content and metadata from PDF files (MIME type: application/pdf) is supported. However, in some circumstances, for example if the PDF file contains LZW compressed objects, no
content is extracted.
-----------------

Best regards,
Martin

Tom_Michaud · July 14, 2003, 7:13pm

Martin,

Thanks for the info. I have been unable to get any PDF over 300K to be indexed properly. Is there a list of other circumstances (besides LZW compressed objects) that would prevent content from being extracted properly?

Thanks,
Tom

Guest · July 14, 2003, 8:08pm

Hi Tom,

could you send some of your failing pdfs?

Regards,
Martin

jb1 · August 4, 2003, 5:02am

Hi,

i

system · August 21, 2003, 2:53pm

Hi,

we too have problems with a certain PDF-file, the file is processed and some meta-data is extracted, but no content. (Element was empty.)

The file is about 2,5 MB big, but we successfully processed a file twice as large, so it mustn’t be a size problem.

I can not post this file (it’s project confidential) but I will send it to the moderator upon request.

Guest · August 21, 2003, 3:50pm

Hallo Andreas,

please send the file to martin.wallmer@softwareag.com. I’ll try, if it works with the latest version of pdf plugin.
As of the supplier of pdf plugin, some problems in this area are fixed.

regards,
Martin

Tom_Michaud · August 27, 2003, 7:16pm

Martin,

When will the latest version of the PDF plugin be available in the Community?

Thanks,
Tom

Gabriel_Vazquez · September 2, 2003, 4:06am

I also need to know if we will have better support for PDF. My customers are very interested in this kind of functionality.

Saludos

Gabriel

Guest · September 2, 2003, 12:46pm

Hi all,

there seem to be thousands of pdf formats out in the world.
please send failing pdf files to
martin.wallmer@softwareag.com
I’ll make a testsuite out of them and send it to the developer of the pdf plugin.

Regards,
Martin

Topic		Replies	Views
"large" pdf-files Tamino	2	10211	April 2, 2021
problem with two pdf files Tamino	4	11731	April 2, 2021
Non-XML Indexer Processing of PDF File Guidlines Tamino	3	10937	April 2, 2021
Using newer POI versions with Indexer Tamino	2	10399	April 2, 2021
Tamino and large NON-XML Documents (e.G. Word) Tamino	2	5829	April 2, 2021

pdf support

Related topics