Data-dumping/loading, Tamino API and bad performance

Hi all!

I have serious performance problems with the Tamino Java API. I have tried to use it earlier, and hoped the performance troubles were history by now, but it seems I still have problems.

Here is what I want to do:

  1. Dump images from tamino-collection with doctype “from”
  2. Scale images
  3. load images to tamino-collection with doctype “to”.

Quite simple, I’d say. But this process takes approx. 2800ms per picture (size ~100KB). Having 50000 pic predicts ~39 hours processing time. This is far to much. The odds for some crash during av 39 hours run is far to high to risk running this. I have tested this on the following hardware:
IBM AiX 5L 5.1, 2xPowePC 375Mhz / 2,50 Gb RAM
Fedora Core 3, 2xIntel 1266Mhz / 2,25 Gb RAM
Dell laptop running Win XP, 1,8Ghz/1Gb RAM

My analysis som far shows these results on all platforms :

iternator.next(): time1 = 150ms
TNONXmlObject.getInputStream(): time2 = 1500ms
Scaling: time3 = 3,5ms
converting image to stream: time4 = 500ms
saving back image: time5 = 206ms

These timing-points (time1-5) are showed in the code below.

Have anybody got any ides to speed up the performance of my program?

Here is my program:


StringReader sr = new StringReader("");
TNonXMLObject nonxmlObject = TNonXMLObject.newInstance(sr, collection,
    doctypeFrom, "doc", mimetype);

TConnection inputConnection = 
  TConnectionFactory.getInstance().newConnection(dburl);
TNonXMLObjectAccessor inputNonXmlObjectAccessor = 
  inputConnection.newNonXMLObjectAccessor(
      TAccessLocation.newInstance(collection));

TConnection outputConnection = 
  TConnectionFactory.getInstance().newConnection(dburl);
TNonXMLObjectAccessor outputNonXMLObjectAccessor = 
  outputConnection.newNonXMLObjectAccessor(
      TAccessLocation.newInstance(collection) );

String qs = args[1]; //qs=<doctypefrom>
TQuery query = TQuery.newInstance(qs);
try {
  // Invoke the query operation.
  TResponse response = inputNonXmlObjectAccessor.query(query);
  if (response.hasFirstNonXMLObject()) {
    TNonXMLObjectIterator toi = response.getNonXMLObjectIterator();
    TNonXMLObject image;
    while (toi.hasNext() && cont ) {
      image = toi.next();
      //TIME1
      try {
        BufferedImage bi = ImageIO.read(
            image.getInputStream()
          );
        //TIME2
        BufferedImage bi2 = scale(bi, NEWWIDTH);
        //TIME3
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ImageIO.write(bi2, "jpg", baos);
        ByteArrayInputStream bais = new ByteArrayInputStream(
            baos.toByteArray()
        );
        //TIME4
        TNonXMLObject bilde = TNonXMLObject.newInstance( 
            bais,
            collection,
            doctypeTo,
            image.getDocname(),
        "image/jpeg" );
        try {
          inputNonXmlObjectAccessor.insert(bilde);
        } catch (TInsertException insertException) {
          System.out.println( "Error saving image" );
        } 
        //TIME5
      } catch (IOException e) {
        e.printStackTrace(System.out);
      }
    }//while
  } else {
    System.out.println("No instance found: " + qs);
  }
} catch (TQueryException queryException) {
  System.out.println("Query failed!");
}

Kind regards
Sverre Magnus Elvenes Joki

I think I’d be investigating not using the Java API, but instead using the Tamino data loader to dump your images, then process them offline, and then use the data loader to load them back in.