Jakarta-poi

Hello guys,
this my question:
could i extract documents content directly using jakarta-poi api???, you doit???

Alfonso.

Hi,

The non-xml indexer internally uses Jakarta POI to extact meta-data (and not content) from MS documents. The meta-data is stored as a XML document with the same internal id (ino:id) as the non-xml document.

Hope this helps.

Stuart Fyffe-Collins
Software AG (UK) Ltd.

hi,
i’m trying Tamino non xml indexer, but i’m really interesting only into “generated” by the indexer.

Alfonso.

Hi Alfonso,

The nonXMLIndexer generates . For Excel and MS word content POI is used internally. Of course all formatting disappears, but you can make text queries, for example "Find all word documents containing ‘Tamino’ "

Regards,
Martin

Hi Martin,
i only want the content, if is possible, without using the indexer.

IndexedDocument.getContent();

Regards,
Alfonso

Hi Alfonso,

then you have to write your own indexer (or content extractor). What do you want to do with that content?

regards,
Martin

Full text search

Thats what the nonXMLIndexer is designed for.

Regards,
Martin

My problem is metadata(xml) adding for content, i only want use one collection.

nonXML indexer writes the metadata(XML for properties AND content) in the same collection (even in the same schema) as the document itself.

Regards,
Martin