Badly performing query


I have a query where I - among other things - try to detect duplicates.

The dublicates have identical “AnmeldelseIdentifikator” but different tracsaction timestamps in “TransaktionsTid”.

In each group of documents with same “AnmeldelseIdentifikator” i want to process the first and move other duplicates to another collection.

The query works with a few documents, but with 1000 documents in the input collection fails because it takes too long to complete.

It uses a function “tafe:getAnmeldelseIdentifikator()” for which I have created a function-index.
It doesn’t seem to use that index, though.

The “Kategoriser” query is somewhat complex, so here’s a small part of it, which performs just as bad:

declare namespace wsa      = "";
declare namespace xsi      = "";

declare namespace cvr      = "";
declare namespace kms       = "";
declare namespace cpr      = "";
declare namespace xkom     = "";

declare namespace etl      = "";
declare namespace model    = "";
declare namespace snapshot = "";
declare namespace msgext   = "";
declare namespace akt      = "";
declare namespace msg      = "";
declare namespace msgadm   = "";
declare namespace anm      = "";

declare namespace fase2_1  = "";
declare namespace ho       = "";

declare namespace aid=""
declare namespace cfg = "";
declare namespace log = "";
declare namespace tf = "";

import module namespace tafe = "";

declare function local:isDublet($doc as node()) as xs:boolean {
    [tafe:getAnmeldelseIdentifikator(.) = tafe:getAnmeldelseIdentifikator($doc)
    and .//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid
    < $doc//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid]

for $doc in input()/*
	where local:isDublet($doc)
	return $doc

I’d very much appreciate ideas on how too get this to use the index or otherwise improve performance.

Best regards
Claus Nielsen
[/code] (6.7 KB) (19.9 KB)

Hi Claus,
It looks as if something in your XQUERY triggers not only one but TWO full “tablescans”.

  • and I guess only one is expected :wink:

I tried to move the “isDublet” functionality into the main loop and it appears that the “exists()” function is the culprit!

So perhaps somekind of “if” statement could be used instead ?

  • it is pretty hard to test without example-data …

Hi Claus,
as Finn already said it is difficult to assess without test data. It might be the "tafe:getAnmeldelseIdentifikator(.) = tafe:getAnmeldelseIdentifikator($doc) " predicate cannot be evaluated index-based and this leads to many more doctype scans at runtime.

May be you can try the following in the getDublet() function:

let $x := tafe:getAnmeldelseIdentifikator($doc)
let $y := $doc//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid
return (
[tafe:getAnmeldelseIdentifikator(.) = $x
and .//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid < $y]
) )

and avoid as many // operators as possible.

Best Regards,