Greetings,
I am running Tamino 4.1.4.5 on W2K, dual PIII 1.4GHz, 1GB RAM.
I’d need to do a mass delete on a large group of data in Tamino.
My document contains 140,000 records, ~ 160MB, and I’d need to delete 110,000 records.
Using X-Query syntax:
http://…?_delete=document[ID=“xxx”] (ID has mixed indexed.)
The request either timeout or exhaust theh journal space.
I have bumped up the “maximum transaciton duration” and journal space, still nothing is enough.
Even when I tried break down the transaction into small subset, Tamino took about 20 minutes to delete 200 records. Toooooooo Slowwwwwwwww !!!
At this rate, even if the mass delete transaction can run to completion, it’ll take forever.
Do you have any suggestion on how to do a mass deletion ? (short of dropping and redefine the document ?)
Thanks.
Hi,
There can be a number of reasons why the delete is so slow. First of all I would try the query that selects the documents to be deleted. If the performance of that query is bad you should check if all required indexes are defined properly.
Best regards,
Thorsten
Thank for responding.
I don’t think my issue is index related however, because
1) I have mix indexes defined on the search element.
2) Query that retrieves the record count, using the same search criteria came back OK.
I didn’t pull down the complete document listing, because the result would be huge, and that we would be getting into network bandwidth issue, which wouldn’t be a good true test.
My understanding of the problem is that because Tamino tries to log every single one of the delete transactions. And for a large data volume, I can see why it takes so long.
Other DBMS (Sybase, MSSQL) provide a bulk processing mode, where you can turn off logging prior to running large data volume processing like truncating, loading .etc. This helps tremendously with batch, maintenance processing .etc. where you know you don’t need the data integrity protection of an OLTP mode.
I wonder why SAG doesn’t offer something similar ? And whether anyone knows of a workaround on how to handle bulk processing ?
Thanks.
If you can’t speed up the query by defining more appropriate indexes you can
1) Split the delete transaction into many small transactions. It isn’t one transaction any more but you need a lot less resources on your machine. A good chunk size is several hundred documents. Try what works best for you.
2) A really ugly workaround is:
a) Export those documents that shall remain in the data base (using the Tamino Data Loader).
b) Undefine the doctype. - This is really fast!
c) Define the doctype again.
d) Reimport your documents.
I would not recommend doing this but it may be the fastest way.