Using newer POI versions with Indexer

system · June 15, 2004, 7:29pm

Hi,

like the documenation says, from PDF files not always all content is extracted and - thus - indexed. Can this be improved by using newer versions of POI.

To put it another way: Do newer versions of POI extract more of PDF, Word and so on - and can they be used, because the interface used by the indexer has not changed?

Best regards, Andreas

M_Gesmann · July 6, 2004, 2:37pm

Hi Andreas,

the nonXML indexer as released with Tamino V4.2 uses the latest officially released version of POI (v2.5).
The POI project consists of APIs for manipulating various file formats based upon Microsoft’s OLE 2 Compound Document format using pure Java. We use it for Word and Excel files. It does not support PDF files.

Best regards, Michael

Topic		Replies	Views
RoadMap for Indexer? Tamino	2	10203	April 2, 2021
Jakarta-poi Tamino	10	14996	April 2, 2021
Indexer 4.1.4 and tamino 4.2.1 Tamino	2	10507	April 2, 2021
Non-XML Indexer Processing of PDF File Guidlines Tamino	3	10937	April 2, 2021
problem with two pdf files Tamino	4	11731	April 2, 2021

Using newer POI versions with Indexer

Related topics