retrieving utf-8 documents

I am having problem retrieving a utf-8 encoded document using the java api. When I get the document using the Tamino x-plorer it comes back fine. However with the api it is not encoded properly. The xml tag say utf-8 but the characters are not correct. In particular a copyright symbol is coming back as a single byte instead of the 2 byte representation it should be for utf-8

The code I am using to get the document out is as follows:


  • @author Created by Omnicore CodeGuide

package edu.harvard.hul.ois.oasis;

import com.softwareag.tamino.db.api.connection.;
import org.jdom.
import edu.harvard.hul.ois.xml.
import org.xml.sax.;
import edu.harvard.hul.ois.ted.
import org.jdom.output.;
import edu.harvard.hul.ois.xdom.
import com.softwareag.tamino.db.api.accessor.
import com.softwareag.tamino.db.api.objectModel.jdom.TJDOMObjectModel;
import com.softwareag.tamino.db.api.response.TResponse;
import com.softwareag.tamino.db.api.objectModel.*;
import org.jdom.xpath.XPath;

public class OasisUnloader {

private static String databaseURI ="";
private static String collection ="";
private static String schemaLocation ="";
private static FileWriter logFile;

 * @param args the command line arguments
public static void main(String[] args) throws NumberFormatException,
	TServerNotAvailableException, JDOMException, IOException, TConnectionCloseException, TQueryException, TNoSuchXMLObjectException, TIteratorException, TTransactionModeChangeException {
	if (args.length != 1) {
		System.out.println("usage: java ViaLoader oasisLoader.xml");
	String oasisLoaderConfigFile = args[0];
	XPath uniqueIdXpath = null;
    if ((new File(oasisLoaderConfigFile)).exists()) {
		//If there is an exception before the logfile is created then just dump
		//it to standard error.
		try {
			XmlConfig config = new XmlConfig(oasisLoaderConfigFile);
			databaseURI = config.getString("databaseURI");
			collection = config.getString("taminoCollection");
			schemaLocation = config.getString("schema");
			String logFileName = config.getString("logFile");
			logFile = new FileWriter(logFileName);
			uniqueIdXpathString = config.getString("uniqueIDXpath");
			//uniqueIdXpath = XPath.newInstance(uniqueIdXpathString);
			uniqueIdXpath = XPath.newInstance("eadheader/eadid");
		} catch (IOException e) {
		} catch (SAXException e) {
	TConnectionFactory connectionFactory = TConnectionFactory.getInstance();
	//Obtain the connection and accessor for querying
	TConnection _connection = connectionFactory.newConnection(databaseURI);
	TXMLObjectAccessor _accessor = _connection.newXMLObjectAccessor(TAccessLocation.newInstance(collection),
	_connection.setIsolationLevel(TIsolationLevel.UNPROTECTED) ;
	TLocalTransaction _transactionID = _connection.useLocalTransactionMode();
	TQuery tQuery = TQuery.newInstance("/ead[eadheader/eadid='ajp00003']");
	TResponse _response = _accessor.query(tQuery,5);
	TXMLObjectIterator objectIterator = _response.getXMLObjectIterator();
	TXMLObject tXmlObject = TXMLObject.newInstance(TJDOMObjectModel.getInstance());
	int itemsProcessed = 0;
	Element oasisRecord;
	XMLOutputter xmlWriter = new XMLOutputter();
	FileWriter fw;
	Element eadidTextNode = null;
	String eadid;
	String encoding;
	while(objectIterator.hasNext()) {
	//while(itemsProcessed < 2) {
		System.out.println("Items Processed: " + itemsProcessed);
		//oasisRecord = ((Element);
		tXmlObject =;
		encoding = tXmlObject.getEncoding();
		encoding = tXmlObject.getEncoding();
		oasisRecord = ((Element)tXmlObject.getElement()).detach();
		eadidTextNode = (Element)uniqueIdXpath.selectSingleNode(oasisRecord);
		eadid = eadidTextNode.getText();
		fw = new FileWriter("/home/oasis/xmlOutput/" + eadid + ".xml");
		encoding = fw.getEncoding();
		Document doc = new Document(oasisRecord);
		xmlWriter.output(doc, fw);


any insight would be great.

Hi Mandell,

I found the following in the Release doc that explains the encoding problem you have encountered.

Release Notes
XML Documents: Encoding Information
The encoding information stored in the database currently gets lost. For TXMLObjectAccessor, this is because the underlying SAX parsers ignore the encoding information. For the TStreamAccessor implementation, this happens because no encoding information is recognized.

As a possible workaround you could try the following property:

Add the above to your java startup command, i.e., java OasisUnloader

Hope this helps,