Q: Xquery how to improve performance?

Hi once more,

We have written the following XQuery

declare namespace xs = “XML Schema
for $mainflt in input()/Fltshare,
$shareflt in $mainflt/Shared/Flight
where ( ($mainflt/@mainflt = “B” and $shareflt/@sharefltnr >= “K”) or $mainflt/@mainflt > “B”)
and $mainflt/@startdate >= xs:date(“2003-01-01”)
and $mainflt/@enddate <= xs:date(“2004-01-01”)
and $mainflt/@arrdep = “A”)

return
{ $mainflt/@mainflt }
{ $mainflt/@arrdep }
{ $mainflt/@startdate }
{ $mainflt/@enddate }
{ $mainflt/Opdays }
{ $shareflt }

sort by ($mainflt/@mainflt,$shareflt/@sharefltnr,$mainflt/@startdate,$mainflt/@enddate)

The following document is an example of a Shared document.



mo








This query works, but the performance is terrible. It takes about two minute to output the documents. There are approx. 5000 instances. All fields that are used within the WHERE clause are indexed.

Could someone tell me how I can improve the performance of this query an order of magnitude?

Regards,
Rudolf de Grijs

Hi Rudolf,
Quote from Walter Waterfelt: “Due to the unfortunate current XQuery semantic that newly constructed elements loose all type information…”

In other words; index-info can only be uses for sorting, if the sort statement is placed prior to the return, because the newly constructed “document” in the return statement has no schema and thus no index-info.

Finn

Rudolf, your query looks reasonable to me. Maybe Tamino is doing the self join before evaluating the filter conditions? If you remove the sort clause, how long does it take? Why don’t you try nested for loops like this:

declare namespace xs = “XML Schema
for $mainflt in input()/Fltshare
where $mainflt/@startdate >= xs:date(“2003-01-01”)
and $mainflt/@enddate <= xs:date(“2004-01-01”)
and $mainflt/@arrdep = “A”

return
for $shareflt in $mainflt/Shared/Flight
where ( ($mainflt/@mainflt = “B” and $shareflt/@sharefltnr >= “K”) or $mainflt/@mainflt > “B”)
return

{ $mainflt/@mainflt }
{ $mainflt/@arrdep }
{ $mainflt/@startdate }
{ $mainflt/@enddate }
{ $mainflt/Opdays }
{ $shareflt }

sort by ($mainflt/@mainflt,$shareflt/@sharefltnr,$mainflt/@startdate,$mainflt/@enddate)

Actually, the sort by clause seems wrong to me. I thought that the sort expression is relative to the output data from the FLWR statement, in which case the variables $mainflt and $shareflt have no meaning. Maybe the sort expression should be
sort by (@mainflt,Shared/Flight/@sharefltnr,@startdate,@enddate)

Alternately, you could loop on the shareflight first, like below, but it seems like it would be slower to me. Tell us your results though.

declare namespace xs = “XML Schema
for $shareflt in input()/FltShare/Shared/Flight
let $mainflt := $shareflt/…/…/…/FltShare
where ( ($mainflt/@mainflt = “B” and $shareflt/@sharefltnr >= “K”) or $mainflt/@mainflt > “B”)
and $mainflt/@startdate >= xs:date(“2003-01-01”)
and $mainflt/@enddate <= xs:date(“2004-01-01”)
and $mainflt/@arrdep = “A”)

return
{ $mainflt/@mainflt }
{ $mainflt/@arrdep }
{ $mainflt/@startdate }
{ $mainflt/@enddate }
{ $mainflt/Opdays }
{ $shareflt }

sort by ($mainflt/@mainflt,$shareflt/@sharefltnr,$mainflt/@startdate,$mainflt/@enddate)

Bill

Hello,
as Finn mentioned and cited me correctly you should do the sorting (and filtering) before constructing new elements,
e.g.

declare namespace xs = “XML Schema
for $mainflt in (input()/Fltshare sort by (@mainflt, @startdate, @enddate) )
where $mainflt/@startdate >= xs:date(“2003-01-01”)
and $mainflt/@enddate <= xs:date(“2004-01-01”)
and $mainflt/@arrdep = “A”

return

{ $mainflt/@mainflt }
{ $mainflt/@arrdep }
{ $mainflt/@startdate }
{ $mainflt/@enddate }
{ $mainflt/Opdays }
{
for $shareflt in ($mainflt/Shared/Flight sort by (@sharefltnr))
where ( ($mainflt/@mainflt = “B” and $shareflt/@sharefltnr >= “K”) or $mainflt/@mainflt > “B”)
return $shareflt

}


It does not completely reflect your original sort by criteria order, which is not possible, assuming Shared containing
multiple elements.

Best regards
Walter