Hi.
I have a query where I - among other things - try to detect duplicates.
The dublicates have identical “AnmeldelseIdentifikator” but different tracsaction timestamps in “TransaktionsTid”.
In each group of documents with same “AnmeldelseIdentifikator” i want to process the first and move other duplicates to another collection.
The query works with a few documents, but with 1000 documents in the input collection fails because it takes too long to complete.
It uses a function “tafe:getAnmeldelseIdentifikator()” for which I have created a function-index.
It doesn’t seem to use that index, though.
The “Kategoriser” query is somewhat complex, so here’s a small part of it, which performs just as bad:
declare namespace wsa = "http://www.w3.org/2005/08/addressing";
declare namespace xsi = "http://www.w3.org/2001/XMLSchema-instance";
declare namespace cvr = "http://rep.oio.dk/cvr.dk/xml/schemas/2005/03/22/";
declare namespace kms = "http://rep.oio.dk/kms.dk/xml/schemas/2005/03/11/";
declare namespace cpr = "http://rep.oio.dk/cpr.dk/xml/schemas/core/2005/03/18/";
declare namespace xkom = "http://rep.oio.dk/xkom.dk/xml/schemas/2005/03/15/";
declare namespace etl = "http://rep.oio.dk/tinglysning.dk/svarservice/message/integration/1/";
declare namespace model = "http://rep.oio.dk/tinglysning.dk/schema/model/1/";
declare namespace snapshot = "http://rep.oio.dk/tinglysning.dk/schema/snapshot/1/";
declare namespace msgext = "http://rep.oio.dk/tinglysning.dk/svarservice/message/anmeldelse/1/";
declare namespace akt = "http://rep.oio.dk/tinglysning.dk/schema/elektroniskakt/1/";
declare namespace msg = "http://rep.oio.dk/tinglysning.dk/service/message/tingbog/1/";
declare namespace msgadm = "http://rep.oio.dk/tinglysning.dk/service/message/administration/1/";
declare namespace anm = "http://rep.oio.dk/tinglysning.dk/schema/anmeldelse/1/";
declare namespace fase2_1 = "http://skat.dk/begrebsmodel/2009/01/15/";
declare namespace ho = "http://skat.dk/begrebsmodel/xml/schemas/kontekst/2007/05/31/";
declare namespace aid="http://skat.dk/tafe/anmeldelseid/xml/schemas/2009/06/01/"
declare namespace cfg = "http://skat.dk/tafe/config/xml/schemas/2009/10/13/";
declare namespace log = "http://skat.dk/tafe/log/xml/schemas/2009/06/01/";
declare namespace tf = "http://namespaces.softwareag.com/tamino/TaminoFunction";
import module namespace tafe = "http://skat.dk/tafe";
declare function local:isDublet($doc as node()) as xs:boolean {
exists(collection('Staging2')/fase2_1:TinglysningAnmeldelseKopiSvarModtag_I
[tafe:getAnmeldelseIdentifikator(.) = tafe:getAnmeldelseIdentifikator($doc)
and .//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid
< $doc//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid]
)
}
for $doc in input()/*
where local:isDublet($doc)
return $doc
I’d very much appreciate ideas on how too get this to use the index or otherwise improve performance.
Best regards
Claus Nielsen
[/code]
queries.zip (6.7 KB)
xsd.zip (19.9 KB)