Query Languages for Semistructured Data

Query Languages for Semistructured Data

CSE 636 Data Integration XML Query Languages XQuery XQuery http://www.w3.org/TR/xquery/ (11/05) Functional Programming Language Operates on XML Sources Returns XML 2 XQuery Components XQuery is composed of Path expressions Element constructors FLWOR expressions and more 3 Path Expressions doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER Evaluate expression by collecting all elements which satisfy the path CUSTOMER_ORDERS CUSTOMER EMAIL NAME Sue [email protected]

CUSTOMER ORDER NAME EMAIL Tom [email protected] NO CARRIER ITEM 1897 UPS CUSTOMER ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NAME EMAIL Ann [email protected] SKU QTY C5 1 SKU B7 SKU P5 QTY 2 QTY 1 ITEM

SKU QTY C5 2 4 Element Construction { doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER } A complete, executable query returning the ORDERS tree 1. Evaluate expression inside { ... } ORDERS 2. Connect into tree ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B7 SKU P5 SKU QTY C5 1

ORDER QTY 2 QTY 1 ITEM SKU QTY C5 2 5 Introduction to for Expression Our path query { doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER } can be rewritten using a for expression: { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER return $order } ORDERS ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B7

SKU P5 SKU QTY C5 1 ORDER QTY 2 QTY 1 ITEM SKU QTY C5 2 6 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example

Sequences Query Prolog 7 Example with where We take our previous query and add a where clause: { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } The output is the same as in the previous example, except non-UPS carriers are removed. ORDERS ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B7 SKU P5 SKU QTY C5 1 ORDER QTY 2

QTY 1 ITEM SKU QTY C5 2 8 FLWOR Expressions: The for Clause { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } CUSTOMER_ORDERS ORDER NAME EMAIL Tom [email protected] NO CARRIER ITEM 1897 UPS CUSTOMER ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NAME EMAIL Ann [email protected] EMAIL NAME

Sue [email protected] CUSTOMER The for variable ranges over result of in expression SKU QTY C5 1 SKU B7 SKU P5 QTY 2 CUSTOMER QTY 1 ITEM SKU QTY C5 2 9 FLWOR Expressions: The where Clause { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } CUSTOMER_ORDERS ORDER NAME

EMAIL Tom [email protected] NO CARRIER ITEM 1897 UPS CUSTOMER ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NAME EMAIL Ann [email protected] EMAIL NAME Sue [email protected] CUSTOMER Selects only orders with UPS as the carrier SKU QTY C5 1 SKU B7 SKU P5 QTY 2

CUSTOMER QTY 1 ITEM SKU QTY C5 2 10 FLWOR Expressions: The return Clause { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } Every $order that qualified is added to the return list: ORDER ORDER NO CARRIER ITEM 1878 UPS SKU B7 QTY 2 NO CARRIER ITEM 1897 UPS SKU P5 QTY 1

ITEM SKU QTY C5 2 11 FLWOR Expressions: Final Result { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order } ORDERS is constructed into the ORDERS element to complete the example. The list coming from the FLWOR expression ORDER ORDER NO CARRIER ITEM 1878 UPS SKU B7 QTY 2 NO CARRIER ITEM 1897 UPS SKU P5 QTY

1 ITEM SKU QTY C5 2 12 Example with Element Construction { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return { data($order/NO) } } Here, the return statement constructs elements from values The data function returns the value of an element The return statement also contains tags The next slide illustrates how the following result is created: ORDERS ID 1878 ID 1897 13 Return Element Construction { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return { data($order/NO) } } ORDERS 4. Connect into tree ID

1878 3. New element construction 2. Path selection 1. Bring in selected items as before ORDER ID 1897 ORDER NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B7 SKU P5 QTY 2 QTY 1 ITEM SKU QTY C5 2 14 FLWOR Expressions: The let Clause

{ for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return { data($order/NO) } } Our previous example can be rewritten using extra variable bindings to improve clarity: { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier := $order/CARRIER let $id := data($order/NO) where $carrier = "UPS" return { $id } ORDERS } ID 1878 ID 1897 15 FLWOR Expressions: The order by Clause For this example, we prepare a list of customers sorted by customer name { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME order by $customer/NAME ascending return {$customer/NAME} } CUSTOMERS CUSTOMER CUSTOMER CUSTOMER NAME

Ann NAME Sue NAME Tom 16 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 17 Type Conversions In the context of functions and operators, values are automatically extracted from elements: { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER

where $order/CARRIER = "UPS" return { concat("ORDER-", $order/NO) } } 18 Type Conversions $order/NO binds to an element concat() requires a string Value of the element is automatically extracted Same happens to lists containing a single element or value 19 Type Conversions All other cases result in errors { { concat("ORDER-", doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/NO) } } Path expression above binds to lists Cannot extract a value from a list of many items! 20 Type Conversions The data() function can be used to explicitly extract the value: { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return { concat("ORDER-", data($order/NO)) } } 21 Type Conversions

Automatic extraction of values does not occur in element construction In that case, the data() function is required: { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return { data($order/NO) } } 22 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 23 For-Let-Where-Order By-Return (FLWOR) Lets take a more in-depth look at the variable bindings in the query developed previously { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier := $order/CARRIER

let $id := data($order/NO) where $carrier = "UPS" return { $id } } for $var1 in expr let $var2 := expr for and let clauses generate a list of tuples of variable bindings, preserving input order return expr where expr where clause applies a predicate, eliminating some of the tuples order by expr order by clause imposes an order on the remaining tuples return clause is executed for each remaining tuple, generating a list of trees 24 FLWOR Variable Bindings { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier := $order/CARRIER, $id := data($order/NO)

where $carrier = "UPS" return { $id } } for/let $order where return result $carrier $id ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C5 CARRIER FEDEX 1861 NO ORDERS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B7 ORDER CARRIER UPS 1878 YES ID

1878 NO CARRIER ITEM ITEM 1897 UPS SKU QTY SKU QTY 1 1 P5 C5 CARRIER UPS 1897 YES ID 1897 ID 1878 ID 1897 25 for vs. let for Binds node variables iteration for $x in expr binds $x to each element in the list expr let Binds collection variables one value let $x := expr binds $x to the entire list expr Useful for common subexpressions and for aggregations 26

for vs. let for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER return { $order } Returns: let $order := doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER return { $order } Returns: 27 for vs. let { for $sku in distinct-values(doc(co")//ITEM/SKU) let $items := doc(co")//ORDER/ITEM[SKU = $sku] let $qtyTotal := sum($items/QTY) where $qtyTotal > 1 return { $sku } } distinct-values a function that eliminates duplicate values can be applied to simple elements and atomic values sum a (aggregate) function that returns the sum of integers 28 for vs. let { for $sku in distinct-values(doc(co")//ITEM/SKU) let $items := doc(co")//ORDER/ITEM[SKU = $sku] let $qtyTotal := sum($items/QTY) where $qtyTotal > 1

return { $sku } } for/let $sku $items ITEM C5 SKU QTY 1 C5 ITEM $qtyTotal where return 3 YES ITEM C5 YES ITEM B7 NO SKU QTY 2 C5 ITEM B7 SKU QTY 2 B7 ITEM 2

P5 SKU QTY P5 1 1 result POPULAR_ITEMS ITEM C5 ITEM B7 29 for vs. let Find items whose quantity is larger than average: let $avgQty := avg(doc(co)//ITEM/QTY) for $item in doc(co)//ITEM where $item/QTY > $avgQty return $item for/let $avgQty $items ITEM 1.5 NO SKU QTY 1 C5 ITEM 1 1.5

SKU QTY 2 C5 ITEM 2 YES 1.5 SKU QTY 2 B7 ITEM 2 YES 1.5 SKU QTY P5 1 1 NO let $avgQty $qtyTotal where return 1.5 ITEM SKU QTY 2 C5 ITEM

SKU QTY 2 B7 30 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 31 Joins Joins are expressed using a FLWOR with two loop variables two for clauses A where condition specifies how the loop variables relate 32

Join Example Combine orders ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS SKU B7 SKU P5 SKU QTY C5 1 ORDER QTY 2 SHIPPER with shipper info QTY 1 ITEM SKU QTY

C5 2 SHIPPER PICKUP NAME 2PM FEDEX NAME UPS PICKUP 5PM ORDERS to produce order deadlines ORDER ID DEADLINE 1861 2PM ORDER ID DEADLINE 1878 5PM ORDER ID DEADLINE 1897 5PM 33 Join Example Query { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER for $shipper in doc("s")/SHIPPERS/SHIPPER let $id := data($order/NO) let $time := data($shipper/PICKUP)

where $order/CARRIER = $shipper/NAME return {$id} {$time} } Uses multiple for statements to generate Cartesian product of tuples Uses where statement to filter Cartesian product 34 Join Conditions for/let $order $shipper ORDER SHIPPER NO CARRIER ITEM 1861 FEDEX ORDER NAME PICKUP FEDEX 2PM NO CARRIER ITEM 1878 UPS ORDER NAME PICKUP FEDEX 2PM $id $time where return 2PM

YES 1878 2PM NO NO CARRIER ITEM ITEM 1897 UPS ORDER NAME PICKUP 1897 FEDEX 2PM 2PM NO NO CARRIER ITEM 1861 FEDEX ORDER NAME PICKUP UPS 5PM 5PM NO NO CARRIER ITEM 1878 UPS ORDER NAME PICKUP UPS 5PM NO CARRIER ITEM ITEM 1897 UPS NAME PICKUP UPS

5PM 1861 SHIPPER SHIPPER SHIPPER 1861 SHIPPER 1878 5PM SHIPPER 1897 5PM YES YES ORDER ID 1861 DEADLINE

2PM ORDER ID 1878 DEADLINE 5PM ORDER ID 1897 DEADLINE 5PM 35 Condensed Join Table { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER, $shipper in doc("s")/SHIPPERS/SHIPPER let $id := data($order/NO), $time := data($shipper/PICKUP) where $order/CARRIER = $shipper/NAME return {$id} In future examples, {$time} non-joined rows are removed, as are join where conditions: } for/let $order $shipper $id $time SHIPPER NO CARRIER ITEM 1861 FEDEX

ORDER NAME PICKUP FEDEX 2PM SHIPPER 1861 NO CARRIER ITEM 1878 UPS ORDER NAME PICKUP UPS 5PM 1878 NO CARRIER ITEM ITEM 1897 UPS NAME PICKUP UPS 5PM ORDER return ORDER 2PM ID 1861 DEADLINE 2PM ORDER 5PM

SHIPPER ID 1878 DEADLINE 5PM ORDER 1897 5PM ID 1897 DEADLINE 5PM 36 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins

Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 37 Nested Queries Nested queries produce hierarchical results An outer FLWOR loop contains an inner FLWOR loop Typically, a where condition in the inner FLWOR specifies how the loops relate 38 Nested Query Example SHIPPER Combine shippers with orders SHIPPER NAME PICKUP FEDEX 2PM NAME UPS ORDER ORDER NO CARRIER ITEM 1861 FEDEX

NO CARRIER ITEM 1878 UPS SKU QTY C5 1 PICKUP 5PM ORDER SKU B7 NO CARRIER ITEM 1897 UPS QTY 2 SKU P5 QTY 1 ITEM SKU QTY C5 2 SHIPPER_ORDERS to produce orders for each shipper SHIPPER NAME FEDEX ORDER

1861 SHIPPER NAME ORDER ORDER UPS 1878 1897 39 Nested Query { for $shipper in doc("s")/SHIPPERS/SHIPPER let $name := $shipper/NAME return { $name } { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where $name = $order/CARRIER return { $id } } } Outer loop binds $shipper and $name variables For each $shipper, $name pair, inner loop binds $order and $id variables Inner where clause removes $order, $id pairs that dont match outer element Inner loop constructs elements from inner variables Outer loop constructs elements from outer variables and from elements constructed in inner loop 40 Join Conditions OUTER LOOP $shipper INNER LOOP OUTER LOOP $id where return return $name $order ORDER NO CARRIER ITEM 1861 FEDEX ORDER

NO CARRIER ITEM 1878 UPS ORDER 1878 NO NO CARRIER ITEM ITEM 1897 1897 UPS ORDER NO SHIPPER NAME PICKUP FEDEX 2PM NAME FEDEX SHIPPER NAME FEDEX ORDER 1861 NO NO CARRIER ITEM 1861 FEDEX ORDER 1861 NO CARRIER ITEM 1878 UPS ORDER

1878 YES ORDER 1878 NO CARRIER ITEM ITEM 1897 1897 UPS YES ORDER 1897 NAME PICKUP UPS 5PM NAME UPS ORDER 1861 SHIPPER YES 1861 SHIPPER NAME ORDER ORDER UPS 1878 1897

41 Condensed Nested Query Table { for $shipper in doc("s")/SHIPPERS/SHIPPER let $name := $shipper/NAME return { $name } { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where $name = $order/CARRIER In future examples, return { $id } } non-matched inner rows are removed, as are } where conditions: OUTER LOOP $shipper SHIPPER NAME PICKUP UPS 5PM $name $order ORDER NAME FEDEX NAME UPS NO CARRIER ITEM 1861 FEDEX ORDER 1861 YES ORDER 1861

NO CARRIER ITEM 1878 UPS ORDER 1878 YES ORDER 1878 YES ORDER 1897 SHIPPER OUTER LOOP $id where return return NAME PICKUP FEDEX 2PM INNER LOOP NO CARRIER ITEM ITEM 1897 1897 UPS SHIPPER NAME FEDEX ORDER 1861 SHIPPER

NAME ORDER ORDER UPS 1878 1897 42 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 43 Boolean Expressions In this section we examine various types of Boolean expressions that may appear in WHERE clauses 44 Functions in Boolean Expressions { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO), $lc := count($order/ITEM) where $lc > 1

return { $id } } for/let $order where return result $id $lc ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C5 1861 1 NO ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B7 ORDER NO CARRIER ITEM ITEM 1897 UPS SKU QTY SKU QTY 1 1 P5 C5 1878

1 NO ID 1897 1897 2 YES ID 1897 45 Disjunctions { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO), $lc := count($order/ITEM ) where $lc > 1 or $order/CARRIER = "FEDEX" return { $id } } for/let $order where return result $id $lc ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C5 1861 1

YES ID 1861 ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B7 ORDER NO CARRIER ITEM ITEM 1897 UPS SKU QTY SKU QTY 1 1 P5 C5 1878 1897 1 2 NO YES ID 1861 ID 1897 ID 1897

46 Existential Quantification { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where some $sku in $order/ITEM/SKU satisfies $sku = "C5" return { $id } } for/let $order where return result $id $sku ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C5 1861 SKU C5 YES ID 1861 ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B7

ORDER NO CARRIER ITEM ITEM 1897 UPS SKU QTY SKU QTY 1 1 P5 C5 1878 1897 SKU B7 SKU P5 SKU C5 NO YES ID 1861 ID 1897 ID 1897 47 Universal Quantification { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where every $sku in $order/ITEM/SKU satisfies $sku = "C5" return { $id } }

for/let $order where return result $id $sku ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C5 1861 SKU C5 YES ORDERS_IDS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B7 ORDER NO CARRIER ITEM ITEM 1897 UPS SKU QTY SKU QTY 1 1 P5 C5 ID 1861

1878 1897 SKU B7 SKU P5 SKU C5 NO ID 1861 NO 48 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example

Sequences Query Prolog 49 Conditionals Example Tree Combine customers CUSTOMER CUSTOMER CUSTOMER NAME Ann NAME Tom NAME Sue MEMBER with member info NAME Tom MEMBER STATUS GOLD NAME Bob STATUS SILVER MEMBER NAME Sue

STATUS GOLD CUSTOMERS to add MEMBER tag to customer data CUSTOMER NAME Ann MEMBER NO CUSTOMER NAME Tom MEMBER YES CUSTOMER NAME Sue MEMBER YES 50 Conditionals Example Query { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return {$name} { if (some $member in doc("m")/MEMBERS/MEMBER satisfies $member/NAME = $name) then

YES else NO } } For each customer, the existential quantification statement checks for the existence of a matching member If a matching member is found, the MEMBER YES tags are output; otherwise, the MEMBER NO tags are output 51 Conditionals Table { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return {$name} { if (some $member in doc("m")/MEMBERS/MEMBER satisfies $member/NAME = $name) then YES else NO } } $customer $name CUSTOMER NAME Ann CUSTOMER NAME Tom CUSTOMER NAME Sue if/then/else return some $member result NAME Ann NAME Tom

NAME Sue MEMBER NO MEMBER NAME STATUS Tom GOLD MEMBER NAME STATUS Sue SILVER MEMBER YES MEMBER YES CUSTOMER NAME MEMBER Ann NO CUSTOMER NAME MEMBER Tom YES CUSTOMER NAME Sue MEMBER YES 52 Topics

For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 53 Simple Aggregation { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) let $ic := count($order/ITEM) return {$id} {$ic} } for/let $order return $id $ic ORDER NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C5 ORDER 1861 1

NO CARRIER ITEM ITEM 1897 UPS SKU QTY SKU QTY 1 1 P5 C5 ID 1861 IC 1 ORDERS ORDER NO CARRIER ITEM 1878 UPS SKU QTY 2 B7 ORDER result ORDER 1878 1 ID 1878 IC 1 ORDER ORDER

ORDER ID IC ID IC ID IC 1861 1 1878 1 1897 2 ORDER 1897 2 ID 1897 IC 2 54 Conditional Aggregation { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) let $items := for $i in $order/ITEM where $i/SKU = "C5" return $i let $ic := count($items) return {$id} {$ic} } $order ORDER $id $items $ic return $i where return ITEM NO CARRIER ITEM 1861 FEDEX SKU QTY 1 C5 ORDER 1861

NO CARRIER ITEM 1878 UPS SKU QTY 2 B7 ORDER 1878 NO CARRIER ITEM ITEM 1897 UPS SKU QTY SKU QTY 1 1 P5 C5 SKU QTY C5 1 YES ORDER ITEM SKU QTY C5 1 1 ITEM 1897 SKU QTY P5 1 ITEM SKU QTY C5 1 IC 1

ORDER ITEM SKU QTY B7 2 ID 1861 NO 0 NO YES ID 1878 IC 0 ORDER ITEM SKU QTY C5 1 1 ID 1897 IC 1 55 Topics

For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 56 Missing Data Join Example We will link CUSTOMER_ORDERS with MEMBERS There are customers that are not members 57 Missing Data Join Trees Combine customers CUSTOMER CUSTOMER CUSTOMER NAME Ann NAME Tom

NAME Sue MEMBER with member info NAME Tom MEMBER STATUS GOLD NAME Bob STATUS SILVER MEMBER NAME Sue STATUS GOLD CUSTOMERS to produce Prioritized customers CUSTOMER NAME Ann CUSTOMER NAME Tom PRIORITY

GOLD CUSTOMER NAME Sue PRIORITY SILVER 58 Missing Data Join Query { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name := $customer/NAME let $status := data($member/STATUS) where $name = $member/NAME return {$name} {$status} } 59 Missing Data Join Table { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name := $customer/NAME let $status := data($member/STATUS) where $name = $member/NAME return {$name} {$status} } for/let/join $customer $member CUSTOMER NAME Ann CUSTOMER

return $name $status Result for Ann is missing! NAME Ann MEMBER NAME Tom CUSTOMER NAME STATUS Tom GOLD MEMBER NAME Sue NAME STATUS Sue SILVER CUSTOMER NAME Tom GOLD NAME Sue SILVER NAME PRIORITY Tom GOLD CUSTOMER NAME Sue

PRIORITY SILVER 60 Missing Data Join Problem CUSTOMERS Wanted: CUSTOMER NAME Ann CUSTOMER NAME Tom PRIORITY GOLD CUSTOMER NAME Sue PRIORITY SILVER CUSTOMERS Got: CUSTOMER NAME Tom PRIORITY GOLD CUSTOMER

NAME Sue PRIORITY SILVER The result we want is analogous to an SQL outer join 61 Missing Data Join Solution Query { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name := $customer/NAME let $status := data($member/STATUS) Our join query where $name = $member/NAME return {$name} {$status} } can be restructured into a nested query: { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return {$name} { for $member in doc("m")/MEMBERS/MEMBER let $status := data($member/STATUS) where $name = $member/NAME return {$status} } } 62 Missing Data Join Solution Table { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return {$name} { for $member in doc("m")/MEMBERS/MEMBER let $status := data($member/STATUS)

where $name = $member/NAME return {$status} } } OUTER LOOP INNER LOOP $customer $name $member CUSTOMER NAME Ann CUSTOMER NAME Tom CUSTOMER NAME Sue OUTER LOOP $status return return CUSTOMER NAME Ann MEMBER NAME Tom NAME Sue NAME STATUS Tom GOLD MEMBER GOLD PRIORITY GOLD NAME STATUS Sue SILVER

SILVER PRIORITY SILVER NAME Ann CUSTOMER NAME PRIORITY Tom GOLD CUSTOMER NAME Sue PRIORITY SILVER 63 Missing Data Joins vs. Nested Queries In joins, tuples with any missing data are eliminated equivalent to an SQL natural or inner join In nested queries, tuples are output in spite of missing data equivalent to an SQL outer join 64 Nested Query Problem How to remove tuples that have some missing data How to force inner join functionality in a nested query 65 Missing Data Nested Query Example Suppose we want a list, by product, of all items on order

perhaps for pulling the items from stock For each product, we want bundles, separate quantities for each order We dont want to list products with no items on order 66 Missing Data Nested Query Trees PRODUCT Combine products PRODUCT NAME SKU B7 Battery SKU C5 NAME Cable ITEM with order items SKU C5 QTY 1 PRODUCT SKU C4 ITEM SKU

B7 QTY 2 SKU P5 NAME Case PRODUCT SKU P5 NAME Phone ITEM ITEM QTY 1 SKU QTY C5 2 ITEMS_ON_ORDER to items on order PRODUCT SKU B7 NAME BUNDLE Battery 2 PRODUCT SKU C5

NAME BUNDLE BUNDLE Cable 1 2 PRODUCT SKU P5 NAME BUNDLE Phone 1 67 Missing Data Nested Query { for $p in doc("p")/PRODUCTS/PRODUCT let $sku := $p/SKU let $name := $p/NAME return {$sku} {$name} { for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty := data($i/QTY) where $sku = $i/SKU return {$qty} } } 68 Missing Data Nested Query Table OUTER LOOP $p INNER LOOP OUTER LOOP $qty return return

$sku $name $i PRODUCT SKU NAME B7 Battery SKU B7 PRODUCT ITEM NAME SKU Battery B7 2 BUNDLE 2 SKU QTY C5 1 ITEM 1 BUNDLE 1 SKU C5 2 BUNDLE 2 QTY 2 ITEM

PRODUCT SKU C5 NAME Cable SKU C5 NAME Cable PRODUCT SKU NAME C4 Case PRODUCT SKU P5 NAME Phone SKU C4 SKU P5 QTY 2 QTY 1 SKU C5 NAME BUNDLE BUNDLE Cable 1 2 SKU NAME C4

Case PRODUCT ITEM SKU P5 PRODUCT PRODUCT NAME Case NAME Phone SKU NAME BUNDLE B7 Battery 2 1 BUNDLE 2 SKU P5 NAME BUNDLE Phone 2 69 Missing Data Nested Query Problem ITEMS_ON_ORDER Wanted: PRODUCT SKU B7 PRODUCT NAME BUNDLE

Battery 2 SKU C5 Got: NAME BUNDLE Battery 2 NAME BUNDLE BUNDLE Cable 1 2 SKU P5 NAME BUNDLE Phone 1 ITEMS_ON_ORDER PRODUCT SKU B7 PRODUCT PRODUCT SKU C5 NAME BUNDLE BUNDLE Cable 1 2 PRODUCT PRODUCT

SKU C4 NAME Case SKU P5 NAME BUNDLE Phone 1 The result we want is analogous to an SQL inner (natural) join 70 Missing Data Nested Query Solution { for $p in doc("p")/PRODUCTS/PRODUCT let $sku := $p/SKU let $name := $p/NAME return Our nested query {$sku} {$name} { for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty := data($i/QTY) where $sku = $i/SKU return {$qty} } can be restructured with the inner for loop } moved to a variable in the outer loop { for $p in doc("p")/PRODUCTS/PRODUCT let $sku := $p/SKU, $name := $p/NAME let $bundle := for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty := data($i/QTY) and a where clause where $sku = $i/SKU can be added to remove return {$qty}

outer elements with where not(empty($bundle)) no inner elements return {$sku} {$name} {$bundle} } 71 Missing Data Nested Query Solution Table $p where return $sku $name $bundle $qty return $i PRODUCT SKU NAME B7 Battery SKU B7 ITEM NAME SKU Battery B7 2 BUNDLE 2 SKU QTY C5 1 ITEM 1 BUNDLE

1 SKU C5 2 BUNDLE 2 QTY 2 ITEM PRODUCT SKU C5 NAME Cable SKU C5 NAME Cable PRODUCT SKU NAME C4 Case PRODUCT SKU P5 NAME Phone SKU C4 SKU P5 QTY 2

NAME Case NAME Phone PRODUCT YES SKU NAME BUNDLE B7 Battery 2 PRODUCT YES SKU C5 NAME BUNDLE BUNDLE Cable 1 2 NO PRODUCT ITEM SKU P5 QTY 1 1 BUNDLE 2 YES SKU P5 NAME BUNDLE

Phone 2 72 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 73 Advanced Example Trees PRODUCT Combine products with orders PRODUCT NAME SKU

Battery B7 SKU C5 PRODUCT NAME Cable SKU C4 PRODUCT NAME Case NAME Phone SKU P5 ORDER ORDER NO CARRIER ITEM 1861 FEDEX NO CARRIER ITEM 1878 UPS NO CARRIER ITEM 1897 UPS ITEM SKU B7

SKU P5 SKU QTY C5 2 SKU QTY C5 1 ORDER QTY 2 QTY 1 PRODUCT_ORDERS to produce orders for each product PRODUCT SKU B7 ORDER 1878 PRODUCT SKU C5 ORDER ORDER 1861 1897 PRODUCT SKU C4

PRODUCT SKU P5 ORDER 1897 74 Advanced Example Query (: By the way, this is a comment :) { for $product in doc("p")/PRODUCTS/PRODUCT return {$product/SKU} { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where some $sku in $order/ITEM/SKU satisfies $sku = $product/SKU return { $id } } } For each product (outer for loop), loop through all orders (inner for loop) Where statement filters out orders which dont contain the product under consideration 75 Advanced Example Exercises Preparation of the query table (table of variable bindings) is left as an exercise How can the query be rewritten to eliminate products with no orders? add a tag to products with no orders? sort by SKU? add a total quantity ordered count under each product?

76 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 77 Sequences Ordered lists of nodes, either element, attribute or text nodes, or a combination thereof Can be constructed in for/let clauses for $product in doc("p")/PRODUCTS/PRODUCT Or manually in the return clause for $product in doc(p")/PRODUCTS/PRODUCT return ( {data($product/SKU)}, {data($product/NAME)} ) Not needed if a parent element constructor is present for $product in doc(p")/PRODUCTS/PRODUCT return {data($product/SKU)}

{data($product/NAME)} 78 Sequences Concatenation ($seq1, $seq2) Union $seq1 union $seq2 $seq1 | $seq2 Example: for $product in doc(p")/PRODUCTS/PRODUCT union doc(co")//ITEM return $product Intersection $seq1 intersect $seq2 Difference $seq1 except $seq2 Union, Intersection and Difference remove duplicates 79 Topics For-Let-Where-Order by-Return Expressions Type Conversions Variable Bindings Joins

Nested Queries Boolean Expressions Conditionals Aggregations Missing Data in Joins and Nested Queries Advanced Example Sequences Query Prolog 80 User-Defined Functions Useful for recursion declare function local:depth($e as element()) as xs:integer { if (empty($e/*)) then 1 else max( for $child in $e/* return local:depth($child) + 1 ) }; Query Prolog for $a in doc(co")/CUSTOMER_ORDERS return local:depth($a) local prefix is reserved for user-defined functions 81 Global Variables Also declared in the query prolog declare variable $threshold := 2; for $order in doc(co")//ORDER let $totalQty := sum($order//QTY) where $totalQty > $threshold return $order Can be used to parameterize your queries 82 XQuery and XML Schemas XML Schemas can be used within XQuery to

validate: Input documents Query Result import schema namespace in="http://www.cse.buffalo.edu/in" at in.xsd"; import schema namespace out="http://www.cse.buffalo.edu/out" at out.xsd"; validate{ { for $custs in doc(co)/in:CUSTOMER_ORDERS/* return $custs } } 83 References XQuery Tutorial Yannis Papakonstantinou http://www.db.ucsd.edu/people/yannis/XQueryTutorial.htm W3C's XQuery homepage http://www.w3.org/XML/Query/ XML School http://www.w3schools.com 84

Recently Viewed Presentations

  • CASAS Handbook for Adult ESL Teachers An Interactive

    CASAS Handbook for Adult ESL Teachers An Interactive

    Bonus: Ask teachers how long this "problem" should take to teach in their class (before you begin looking for the various competencies.) After you have completed this exercise, ask them again how long it would take to teach this "problem."...
  • CS423/523 - penguin.ewu.edu

    CS423/523 - penguin.ewu.edu

    First, it's parasitic, meaning that in addition to encrypting files, it actually infects them so that each time a user attempts to open one of them, the infection process starts back up all over again Second, uses "on-demand polymorphic algorithm"...
  • The Field Mouse - Biddick Academy

    The Field Mouse - Biddick Academy

    The poem describes hay making in Wales, one summer in the early 1990s. The poem is concerned with the Bosnian conflict which was going on at the time and how the weak are affected during wars. Why is the conflict...
  • Ddddddd - University of Colorado Boulder

    Ddddddd - University of Colorado Boulder

    ASTR/GEOL-2040: Search for life in the Universe, Lecture 1 Axel Brandenburg (Visiting Professor from Nordita, Stockholm) Touches up an important question Where do we come from How does life emerge Involves many disciplines Astronomy Biology Chemistry Geology Physics * What...
  • Virtual Camera Model - UCL

    Virtual Camera Model - UCL

    ©Anthony Steed 2001-2003 Overview Simple camera is limiting and it is necessary to model a camera that can be moved We will define parameters for a camera in terms of where it "is", the direction it points and the direction...
  • Marxist Socialism/ Communism

    Marxist Socialism/ Communism

    Historical Materialism. Workers are forced to do the impossible. The Marxist approach to the interpretation of history is known as historical materialism.. According to this theory, the history of society could be categorized by its economic characteristics.
  • DEFENCE AGAINST INFECTIOUS DISEASE - St Leonard's College

    DEFENCE AGAINST INFECTIOUS DISEASE - St Leonard's College

    Vaccines contain a form of the pathogen or toxin that has been . modified (attenuated) so that it is unable to harm the body. Exposure to an attenuated pathogen still allows the . production of memory . cells . against...
  • Glamping TIA REPORT MODULE 713 CLA FREI &

    Glamping TIA REPORT MODULE 713 CLA FREI &

    "Whitepodaims to be a model for sustainable tourism by using a number of measures to limit its impact on nature" (www.whitepod.com) Pricesstartsat 360 CHF per Pod/Night, regardless the number of people. Availabilitycalandar on the internet website.