inoxmld issue

Two issues:

1 ) inoxmld appears to perform a backup before loading data and there is no obvious way to turn this off. This is an issue for a client who wants to invoke inoxmld multiple times before performing a backup.

There should be an option to turn the backup off.

2 ) We found that if we invoked inoxmld with the concurrentWrite option, backups were disabled, however when running the application, it appears to leave connections open. Thus, after a period of time, Tamino appears to lock because of all the open connections.

I have an working example of this.

Additional information:
inosrv is the daemon that is always leaving the socket port (9902 I believe) open. Each time I run inoxmld, another open socket.

Can add up fast.

hello joel,

the fact that inoxmld does make a backup during it’s “regular” operation is a design feature. it is part of the price you pay for the load itself being faster than “manual” loading.
thus, the classic use case for inoxmld is loading large amounts of data into a collection that contains little data.

oh, and, yes, switching on concurrentWrite does switch of the backup, but at the price of the actual loading process not being as fast.


hope this helps.

regards,
andreas f.

Joel,

let me try to summarize the differnces of the loading:

inoxmld
The inoxmld (or Tamino Mass loader) builds the index externally for all data to be loaded. That is the reason for the speed. This has several effects:
- it needs privileged acces to the database. The database is locked for other users
- it needs enough data space on the disc for data and index
- if data exists in the doctype the index of this data is unloaded, merged with the index of the new data and then the result is loaded
- inoxmld in this configuration uses only one CPU
- it always does a backup before loading

The concurrentWrite option works different:
- it does not build the index externally
- it uses the “_process” command of tamino
- it can use multiple CPUs

So this would be the recommendation:
use inoxmld for initial loads into an empty or not very large database. You will gain significant higher speed than with other methods.
Use the ConcurrentWrite option if the database already contains a significant amount of Data, e.g. if you load every day additional data into an existing large Database.

(In the next Version of Tamino next Year you can switch of the backup prior to the Mass load)

Additional hint:
Sending a “_process=” command to Tamino has a certain “overhead time”, that means a minimum time it takes to be processed. This can add up if you process always a single document. But Tamino allows you to send a set of documents in one _process command, so you can much better performance sending several documents in one _process statement (“_process=

regards,

Timm

hi again,

i have tried to reproduce the open socket issue, but could not see an increasing number of open sockets.
can you send more precise information on
* what you did
* how you monitored the sockets

are you aware of the TcpTimeWaitDelay parameter?

regards,
andreas f. y

Hi Joel,

there have been some questions concerning the open socket issue. Do you have some additional information on this (see also previous mails by Timm and Andreas)? Could you please tell us, which communication method (TCP/IP or XTS) have been used (did you specify hostname and port or just database name)?

Regards

Harald