Hi,
like the documenation says, from PDF files not always all content is extracted and - thus - indexed. Can this be improved by using newer versions of POI.
To put it another way: Do newer versions of POI extract more of PDF, Word and so on - and can they be used, because the interface used by the indexer has not changed?
Best regards, Andreas
Hi Andreas,
the nonXML indexer as released with Tamino V4.2 uses the latest officially released version of POI (v2.5).
The POI project consists of APIs for manipulating various file formats based upon Microsoft’s OLE 2 Compound Document format using pure Java. We use it for Word and Excel files. It does not support PDF files.
Best regards, Michael