We are planning an application using Tamino were there are loads of Non-XML documents to be loaded. Tests showed exploding log spaces. Is there a way to reduce the logging or to switch it off? No recovery from logging information would be needed. Recovery from backup would be enough.
Greetings
Sascha Hosse
Hi,
no, there is no documented method to do so. But note that, when setting backup generations to a low value, log spaces will be removed after a new backup is done
Regards
Harald Sch
Hi,
thanks for your reply. Unfortunately the amount of logging data that is written for one upload job is far too big. So reducing the number of backup generations wouldn’t help.
By the way: Wouldn’t it be an interesting feature to switch off logging?. This could also be temporarily.
Regards,
Sascha Hosse
Just to give an example:
Uploading of 42.8 MB (41 files) results in 56.64 MB of log space. In the end there will be more than 100000 files.
Regards,
Sascha Hosse
Hi Sascha,
100.000 nonxml-docs per day ???
Could you expand a bit about the scenario as well as what the actual “problem” with logging is: disc-space, writeIO performance …
regards
Finn
PS are there any xml-docs part of the transaction (meta-data describing the nonxml files etc.) ?
Hi,
it’s 100000 files for the initial load. And the problem is disk space. The customer estimated a few hundred GB for the files. Having even more log space makes it impossible to handle. Nothing known about daily changes, but 1000 files are possible.
The customer wants to perform full text search on the files, so Tamino Non XML Indexer is used. He wants to add some more meta data, but this can be done afterwards. Thats not the problem. The files are mostly MS-Office documents or plain text files.
Regards,
Sascha Hosse
Hi again Sascha,
As Harald mentioned the initial load shouldn’t be a problem if you set the “backup generation” parameter to 1, and the do a backup for approx every 10.000 files. In this case all logfiles will be deleted when the backup has succesfully finished.
I wouldn’t consider 1000 docs daily to be any problem (unless of course 1000 users try to store within the same splitsecond
BTW have you considered using the office2003 feature “save as XML” and then natively store the MSWord-docs etc. in Tamino ?
Finn
Hi Finn,
thanks for your advise. I think we will have to do backups during the initial load and see how this affects the performance.
As for the ‘Save as XML’ feature: the documents are created by users and we don’t have any influence on them.
Regards,
Sascha