W3C XQUERY Finding Duplicates in doctype with criteria ?

Hi folks :D,

not posted for a while but I have an interesting problem to solve.

Lets say I have a schema in my tamino database called ‘customer’ and a customer document looks like this.


<customer>
   <id>12345</id>
   <firstname>Ian</firstname>
   <surname>Gratton</surname>
   <house-no>1</house-no>
   <street>XML Avenue</street>
   <post-code>DE21 2XX</post-code>
</customer>

I need to be able to list all customer documents that have

duplicated surnames AND duplicated house-no AND duplicated post-codes.

The xquery distinct-values() function is great for removing duplicates but in this case I want only the duplicates.

Any ideas :?:

Thanks in advance.

quick update.

This is the equivalent SQL statement to do the same


select
  a.id,
  a.surname,
  a.house_no,
  a.post_code
from
  customer a,
  customer b
where
  a.surname = b.surname
  and a.house_no = b.house_no
  and a.post_code = b.post_code
  and a.customer_id <> b.customer_id
order by
  a.surname,
  a.paon,
  a.post_code

Hi,

As in SQL you can perform a self-join on the “customer” doctype. So, you can translate your SQL query into a similar XQuery expression that looks like this:

for $a in input()/customer
for $b in input()/customer
where
$a/surname = $b/surname
and
$a/hose-no = $b/house-no
and
$a/post-code = $b/post-code
and
$a/id != $b/id
return
$a

But probably the following grouping query is more appropriate:

for $a in distinct-values(input()/customer/surname)
for $b in distinct-values(input()/customer/house-no)
for $c in distinct-values(input()/customer/post-code)
let $g := input()/customer[surname = $a and house-no = $b and post-code = $c]
where count($g) > 1
return
<res @surname = “{$a}“ @house-no = “{$b}” @post-code=”{$c}”>
{$g/id}

The query lists you all combinations of surname, house-no and post-code with more than one “customer” element in the doctype.

Best regards,

Thorsten Fiebig