XQuery Examples

From fmepedia


Table of contents

Introduction

With the growing number of XML based data sources and web services responding to the users with XML streams, many of us started to feel the need in a tool that would allow extracting data from XML in an easy and flexible way. As usually, our team responded very quickly to this demand, and this is why FME 2009 was empowered with XQuery (http://en.wikipedia.org/wiki/Xquery) - a language designed to query collections of XML data. XQuery for XML is similar to what SQL is for databases.

Here I am going to give a few examples how XQuery can be used and show its power and convenience. I am not an XQuery expert in any way, and my examples do not pretend to be a replacement of the language documentation or manuals, I just would like to share my positive impressions about this new powerful combination of FME and XQuery. I will give examples of element and attribute extraction, FLWOR (standing for FOR-LET-WHERE-ORDER BY-RETURN - analog to SELECT-FROM-WHERE in SQL) queries, simple conditions (IF-THEN-ELSE), and join expression.

Old way of querying XML

Before continuing with the queries, let's have a look at the old-fashioned way of extracting data from XML (here we talk about an arbitrary XML when we don't have the mapping of XML elements to FME features via xfMAP). In FME 2008 and before, any XML data arriving via web services (for example, as an HTTPFetcher response) had to be searched with StringSearcher, the desired element would get into a list attribute, which then had to be indexed and renamed:

For each element you would need at least three transformers. Regular expressions in StringSearcher could be quite complex:

<postal[[:graph:][:space:]]*>([A-Z]+[0-9]+[A-Z]+)</postal>

and still, not very reliable. For example, multiple occurrences of the same element would be quite hard to track (enhancement PR 16994 - "Add the ability to find all the occurrences of the search string"). StringSearcher does not distinguishes between different XML parts, so it is not a good tool at all for parsing such kind of data.

New way

FME 2009 got a rich set of transformers for manipulating XML data - XMLFeatureMapper, XSLTProcessor, XQueryExploder, XQueryExtractor, and XQueryUpdater.

Usually, I try to use some real world example to illustrate how new functionality is implemented. This time, I am using Google Geocoding API, and namely, Reverse Geocoding operation, when having a pair of coordinates (a point feature) we can get back the address closest to this point.

I placed a point on the Safe building in Google Earth, got its coordinates with CoordinateFetcher, made a URL and submitted it to Google Geocoding service with HTTPFetcher:

http://maps.google.com/maps/geo?ll=49.137,-122.857&output=xml&key=my_google_api_key (http://maps.google.com/maps/geo?ll=49.137,-122.857&output=xml&key=ABQIAAAAZC06zIY2wM-iEj4qaSbGZhS5hQ-pym-epbVAsVk6WNunG2TZ2BThxLWyy1c4WgjPACtvtC1kO-lShw)


You may want to get your own Google API Key here - http://code.google.com/apis/maps/signup.html.

I specified XML as an output format (the alternatives are CSV, JSON, and KML, which is, essentially, the same XML, but with the different MIME type). Here is the portion of the XML sent back:

<?xml version="1.0" encoding="UTF-8" ?> 
<kml xmlns="http://earth.google.com/kml/2.0">
  <Response>
    <name>49.138000,-122.857000</name> 
    <Status>
      <code>200</code> 
      <request>geocode</request> 
    </Status>
    <Placemark id="p1">
      <address>7455 132 St, Surrey, BC, Canada</address> 
      <AddressDetails Accuracy="8" xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0">
        <Country>
          <CountryNameCode>CA</CountryNameCode> 
          <CountryName>Canada</CountryName> 
          <AdministrativeArea>
            <AdministrativeAreaName>BC</AdministrativeAreaName> 
            <Locality>
              <LocalityName>Surrey</LocalityName> 
              <Thoroughfare>
                <ThoroughfareName>7455 132 St</ThoroughfareName> 
              </Thoroughfare>
              <PostalCode>
                <PostalCodeNumber>V3W</PostalCodeNumber> 
              </PostalCode>
            </Locality>
          </AdministrativeArea>
        </Country>
      </AddressDetails>
      <Point>
        <coordinates>-122.8566950,49.1378180,0</coordinates> 
      </Point>
    </Placemark>
    <Placemark id="p2">
      ........... 

Click here (http://www.fmepedia.com/attachments//XQuery_Examples/address_full.xml) to see the full XML. As you may notice there is a quite complex structure within the <AddressDetails> tag. This portion of XML is actually an xAL (Extensible Address Language) (http://www.oasis-open.org/committees/ciq/ciq.html#6), a special language designed for addresses. It's a usualy story in the XML world - whenever you have to learn a new language, you also discover two or three others. During this exercise I learned about XQuery, which is based on XPath (http://en.wikipedia.org/wiki/Xpath) syntax; and along with the xAL, there is also an xNL (Extensible Name Language) (http://www.oasis-open.org/committees/ciq/ciq.html#5). That means four new languages (to me).

Transformer Interface

The dialog of the XQueryExtractor transformer (I mostly use this one) looks quite scary:

however, it is not so bad - seven parameters just say where to take the data and the query for it, and the rest of the parameters has some default values.

There is also an XQUERY Expression Window where we can type our queries:

Query results go to the attribute specified in "Result Attribute" parameter.

Simple XQueries

Simplest XQuery expressions are also valid XPath expressions. And XPath is a language for navigating through the XML document. Loosely speaking, a path to XML parts is similar to the paths we get used to in our operation systems. For example, the path to the <address> tag looks as follows:

/kml/Response/Placemark/address

In fact, we need one more thing before this query will work with our example. The top node <kml> contains an attribute xmlns="http://earth.google.com/kml/2.0". This attribute defines a namespace (http://en.wikipedia.org/wiki/Namespace_(computer_science)), which makes the elements and attributes unique. Since the namespace is present in the top element, we have to declare it and use it to specify all the elements (unless there is another namespace is introduced), so our query should look as following (see the attached simpleQueries.fmw):

(: Query #1 :)
declare namespace x="http://earth.google.com/kml/2.0";

/x:kml/x:Response/x:Placemark/x:address

Note that smiley faces (:  :) are used for commenting.

Double slash "//" tells that the query should look down all the tree, not necessarily at the specified level, so the query may look a bit simpler:

(: Query #2 :)
declare namespace x="http://earth.google.com/kml/2.0";

//x:address


Both queries above will give us the following result on a shorter version of the original XML:

<address xmlns="http://earth.google.com/kml/2.0">7455 132 St, Surrey, BC, Canada</address>

There is another namespace in this XML defined at the <AddressDetails> level. This namespace specifies the unique names of the xAL. If we make a query that goes into this part of the XML, we should declare two namespaces and use them both - each at the appropriate levels:

(: Query #3 :)
declare namespace x="http://earth.google.com/kml/2.0";
declare namespace y="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0";

/x:kml/x:Response/x:Placemark/y:AddressDetails/y:Country/y:CountryName

or, if we don't use elements above the second namespace:

(: Query #4 :)
declare namespace y="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0";

//y:CountryName

Both queries above will give us the following result:

<CountryName xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0">Canada</CountryName>

Of course, XQuery (and XPath) are much more powerful than simple path expressions - there is a rich set of functions helping to extract and transform data.

For example, we often don't need the XML tags - we rather want to get the contents of the tags. The modified query #1 after extending it with string() function will look as follows:

(: Query #5 :)
declare namespace x="http://earth.google.com/kml/2.0";

string(/x:kml/x:Response/x:Placemark/x:address)

The result is:

7455 132 St, Surrey, BC, Canada

Here is a couple of other function examples showing how to extract coordinates. This query will take the substring before comma of the contents within the tags <coordinates>. This gives us X-coordinate of the geocoded point:

(: Query #6 :)
declare namespace x="http://earth.google.com/kml/2.0";

substring-before(string(//x:coordinates), ",")

To get the Y-coordinate, we have to convert the contents into a sequence and remove the last and the firsts of it:

(: Query #7 :)
declare namespace x="http://earth.google.com/kml/2.0";

remove(remove(tokenize(string(//x:coordinates), ","),3),1)

A great XQuery function reference can be found here (http://www.xqueryfunctions.com/xq/).

FLWOR Expressions

The workspace in the previous section used a short portion of XML than the original data (http://www.fmepedia.com/attachments//XQuery_Examples/address_full.xml) received from Google. What if we run one of the queries shown above on the whole XML (See FLWOR.fmw workspace)?

(: Query #8 :)
declare namespace x="http://earth.google.com/kml/2.0";

data(//x:address)

The transformer will return the following string (I set "Return Value" parameter to "Separated Values" and used "|" as a separator, line breaks were added manually for better visibility):

7455 132 St, Surrey, BC, Canada|British Columbia V3W 1J8, Canada|
Surrey, BC, Canada|
Surrey, BC, Canada|
Greater  Vancouver Regional District, British Columbia, Canada|
British Columbia, Canada|
Canada|
North America

This happens because Google API returns multiple points of several accuracy levels - from continent and country levels (lowest) to the address or premise levels (highest). Imagine, we need the highest accuracy possible for each point we geocode. Do do so, we have to scan the entire XML and find such <AddressDetails> element where the "Accuracy" attribute is bigger than in other <AddressDetails>. This is a situation where we have to use FLWOR (http://en.wikipedia.org/wiki/FLWOR) expressions. Again, FLWOR stands for five clauses - FOR, LET, WHERE, ORDER BY, and RETURN.

Here is an example:

(: Query #9 :)
declare namespace x="http://earth.google.com/kml/2.0";
declare namespace y="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0";

for $n in //x:Placemark                   
let $maxAcc:= max(//y:AddressDetails/number(@Accuracy))
  where $n//y:AddressDetails/number(@Accuracy) = $maxAcc
return data($n/x:address)

FOR scans through all <Placemark> nodes. LET assigns the highest accuracy to the $maxAcc variable. WHERE takes only the node where the accuracy of the current <Placemark> (in <AddressDetails> node) is equal to $maxAcc. RETURN sends back the result:

7455 132 St, Surrey, BC, Canada

If we need a certain level of accuracy, we can explicitly indicate it:

(: Query #10 :)
declare namespace x="http://earth.google.com/kml/2.0";
declare namespace y="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0";

for $n in //x:Placemark                   
  where $n//y:AddressDetails/number(@Accuracy) = 3
return data($n/x:address)

And the result is:

Greater Vancouver Regional District, British Columbia, Canada

Join Expressions

Join is one of the most powerful tools in SQL. XQuery also allows joining data. For example, we would like give user a better explanation what the numbers in Accuracy attribute mean. We need an XML that would contain both accuracy numbers and descriptions.

I added such a lookup table directly in the XQuery window, although I could use Concatenator to make an attribute containing two XML parts. Here is the query:

(: Query #11 :)
declare namespace x="http://earth.google.com/kml/2.0";

<dml>
{
//x:Placemark
}
 <AccuracyLookup>
  <AddressAccuracy Accuracy="0">Unknown location</AddressAccuracy> 
  <AddressAccuracy Accuracy="1">Country level</AddressAccuracy> 
  <AddressAccuracy Accuracy="2">Region (state, province, prefecture, etc.) level</AddressAccuracy> 
  <AddressAccuracy Accuracy="3">Sub-region (county, municipality, etc.) level</AddressAccuracy> 
  <AddressAccuracy Accuracy="4">Town (city, village) level</AddressAccuracy> 
  <AddressAccuracy Accuracy="5">Post code (zip code) level</AddressAccuracy> 
  <AddressAccuracy Accuracy="6">Street level</AddressAccuracy> 
  <AddressAccuracy Accuracy="7">Intersection level</AddressAccuracy> 
  <AddressAccuracy Accuracy="8">Address level</AddressAccuracy> 
  <AddressAccuracy Accuracy="9">Premise (building name, property name, shopping center, etc.) level</AddressAccuracy> 
 </AccuracyLookup> 
</dml>

Note that I added the outer <dml> tags, this way we get a single XML document. Here <dml> means Dmitri's Markup Language :-)

Now we can make a join of the two parts of XML - <Placemarks> and <AccuracyLookup>. Again, it should be a FLWOR expression. FOR clause says that we are scanning <AddressDetails> and AddressAccuracy elements, and in WHERE we are looking for the matching attribute Accuracy. As RESULT, we ask to give us a concatenation of the address and the accuracy description:

(: Query #12 :)
declare namespace x="http://earth.google.com/kml/2.0";
declare namespace y="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0";

for $addr in //x:Placemark,
    $accr in //AddressAccuracy
where $addr//@Accuracy = $accr/@Accuracy
return concat (data($addr/x:address), " (Accuracy: ", data($accr), ")")

Here is the result (line breaks were added manually, asterisk was used as a separator):

7455 132 St, Surrey, BC, Canada (Accuracy: Address level)*
British Columbia V3W 1J8, Canada (Accuracy: Post code (zip code) level)*
British Columbia V3W 4M7, Canada (Accuracy: Post code (zip code) level)*
Surrey, BC, Canada (Accuracy: Post code (zip code) level)*
Surrey, BC, Canada (Accuracy: Town (city, village) level)*
Greater Vancouver Regional District, British Columbia, Canada (Accuracy: Sub-region (county, municipality, etc.) level)*
British Columbia, Canada (Accuracy: Region (state, province, prefecture etc) level)*
Canada (Accuracy: Country level)*
North America (Accuracy: Unknown location)

Conditions

XQuery also supports traditional IF-THEN-ELSE expressions. Let's have another look at the source XML. Some Placemark nodes with lower accuracies don't have PostalCode elements:

...
<Placemark id="p7">
 <address>British Columbia, Canada</address> 
 <AddressDetails Accuracy="2" xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0">
  <Country>
   <CountryNameCode>CA</CountryNameCode> 
   <CountryName>Canada</CountryName> 
   <AdministrativeArea>
    <AdministrativeAreaName>BC</AdministrativeAreaName> 
   </AdministrativeArea>
  </Country>
 </AddressDetails>
...

We may want to find such placemarks and give a notification to our user (see If.fmw workspace. I am going to be simple and use "n/a" for those nodes that don't have postal codes.

(: Query #13 :)
declare namespace y="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0";

for $pc in //y:AddressDetails
  return
    if (not($pc//y:PostalCodeNumber))
    then ("n/a")
    else (data($pc//y:PostalCodeNumber))

Again, here we have a FLWOR expression, where the RETURN clause contains a condition telling that if there is no PostalCodeNumber element, return "n/a", otherwise use the value of this element. This gives us the following (asterisk is used as a separator character):

V3W*V3W 1J8*V3W 4M7*V3W*n/a*n/a*n/a*n/a*n/a

Demo workspaces

ReverseGeocoding

Most of the expressions shown above are used in the Reverse Geocoding workspace. The workspace takes three points stored in FFS, constructs a URL, gets an XML from Google, extracts several attributes including address, postal code and accuracy, and places geocoded points.

There is one step not explained yet. I used XQueryExploder to get separate features for each Placemark node with the highest accuracy. One of the source points gives two placemarks. As result, three source points gave four geocoded points. The query used in the Exploder is very simple:

 declare namespace x="http://earth.google.com/kml/2.0";
 
 //x:Placemark

This means, that the transformer will make a separate feature from every Placemark element (and after the previous step, we've got only 4 elements).

NOTE, that the address Google gave us (7455 132 St, Surrey) is not exactly matches the building where Safe Software is located. Our real building number 7445. Initially, I thought, that this was bad, but then I decided to leave it as is - just as a warning that Reverse Geocoding may be not very reliable - do not use it for your mailing needs.

I hope this page will help you to get the idea how great the combined power of FME and XQuery might be, and you will find it useful in your data transformations.

HTML Creation and User Defined Functions

This demo is showing how to extract from an XML all necessary data and put it nicely formatted into HTML with just one XQuery transformer.

Here is our source XML taken from Earth Tools Webservices example:

<?xml version="1.0" encoding="UTF-8" ?> 
  <sun xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.earthtools.org/sun.xsd">
    <version>1.0</version> 
      <location>
        <latitude>49</latitude> 
        <longitude>-123</longitude> 
      </location>
    <date>
      <day>1</day> 
      <month>1</month> 
      <timezone>-8</timezone> 
      <dst>1</dst> 
    </date>
    <morning>
      <sunrise>09:08:20</sunrise> 
      <twilight>
        <civil>08:30:58</civil> 
        <nautical>07:50:26</nautical> 
        <astronomical>07:11:50</astronomical> 
      </twilight>
    </morning>
    <evening>
      <sunset>17:23:31</sunset> 
      <twilight>
        <civil>18:00:53</civil> 
        <nautical>18:41:25</nautical> 
        <astronomical>19:20:02</astronomical> 
      </twilight>
    </evening>
  </sun>


We can write any tags directly in queries. In this case, the XQuery expressions should be placed into curly braces {}:

<table border="1">
 <tr>
  <th>Sunrise</th><th>Sunset</th>
 </tr>

 <tr>
  <td>{data(//sunrise)}</td><td>{data(//sunset)}</td>
 </tr>

This code will generate a valid portion of HTML, which can be viewed with any browser. Here I place the resulting code without any modification, and it is rendered according to all HTML rules, and instead of QXuery expressions we get the actual values:

Sunrise Sunset
09:08:20 17:23:31


Now, as we have three different types of twilight, we may end up with a lot of similar portions of the code:

<tr>
 <th>Civil</th>
 <td>08:30:58</td>
 <td>18:00:53</td>
</tr>

The structure is always the same, the values are different. Can we define functions in XQuery? Yes, we can:

declare function local:twilights($twl)
 {
  <tr>
   <th>{concat(upper-case(substring(name($twl[1]), 1, 1)),substring(name($twl[1]), 2))}</th>
   <td>{data($twl[1])}</td>
   <td>{data($twl[2])}</td>
  </tr>
 };

and the function calls looks as follows:

{local:twilights(//civil)}
{local:twilights(//nautical)}
{local:twilights(//astronomical)}

Note, that we have two elements called civil, nautical, and astronomical - for mornings and evenings. It means that what we send to the function is a sequence <civil>08:30:58</civil>,<civil>18:00:53</civil>. This is why we have to address them with square brackets [ ] - this way we extract the atomic values.

The first expression looks complex, but it simply changes the case of the first character in the word.

The final query is a bit long, but really, it does a lot (use xquery2html_kml.fmw workspace and place the source in the same folder):

(: Query #14 :)

declare function local:twilights($twl)
 {
  <tr>
   <th>{concat(upper-case(substring(name($twl[1]), 1, 1)),substring(name($twl[1]), 2))}</th>
   <td>{data($twl[1])}</td>
   <td>{data($twl[2])}</td>
  </tr>
 };
 
<body>
<h3>
 {let $m:= data(//month)
  return 
         if ($m = "1")  then "January"
    else if ($m = "2")  then "February"
    else if ($m = "3")  then "March"
    else if ($m = "4")  then "April"
    else if ($m = "5")  then "May"
    else if ($m = "6")  then "June"
    else if ($m = "7")  then "July"
    else if ($m = "8")  then "August"
    else if ($m = "9")  then "September"
    else if ($m = "10") then "October"
    else if ($m = "11") then "November"
    else "December"}, 
 {data(//day)}  
</h3>
<h4>at 
 {let $lat:= number(//latitude)
 return 
   if ($lat < 0) 
   then concat(abs($lat), " deg S") 
   else concat($lat, " deg N")},
 {let $lng:= number(//longitude)
 return 
   if ($lng < 0) 
   then concat(abs($lng), " deg W") 
   else concat($lng, " deg E")} 
</h4>
<h5>timezone: 
 {data(//timezone)} 
( 
 {let $dst:= number(//dst)
  return 
   if ($dst = 1) then "dst" else ""}
)
</h5>

<table>
 <tr>
  <th></th><th>Sunrise</th><th>Sunset</th>
 </tr>

 <tr>
  <td></td><td>{data(//sunrise)}</td><td>{data(//sunset)}</td>
 </tr>

 <tr>
  <th></th><th colspan="2" align="center">Twilights</th>
  </tr>

 <tr>
  <th></th><th>Morning</th><th>Evening</th>
 </tr>

{local:twilights(//civil)}
{local:twilights(//nautical)}
{local:twilights(//astronomical)}

</table>
</body>

I called my resulting attribute kml_description_raw_text. Now if I send my output to KML, and click on the point, I get the following placemark balloon with our HTML:

FME XQuery functions

In FME 2010, XQuery transformers got three FME related xquery functions:

fme:has-attribute(<string>)
  This function takes an attribute name, and returns a boolean (true/false)
value if the current feature has an attribute with the given name.
fme:get-attribute( <string> [, <default value? ] )
  This function returns the value of an attribute.  The first parameter must be
a string containing the attribute name.  The second parameter is optional, and
is a default value.  If the feature does not have an attribute with the name
given by the first parameter, this default value will be returned.  If no
default value is given, an empty value is returned.
fme:set-attribute( <string>, <value> )
  This function sets the value of a feature attribute.  The first parameter
must be a string containing the attribute name.  The second value can be any
type and contains the value with which the attribute will be set.  This
function always returns an empty value.

These functions can really simplify many XML-related tasks.

We already have an example that shows how to use XQuery for making an HTML code.

Here I am going to show how to use fme:get-attribute function in XQuery.

The sample workspace is based on the workspace used in Transformer use statistics example - it scans a specified path and finds workspaces, reads their contents and searches for transformer sections:

Here, we change the ending of the workspace - accumulate attributes of all features into one list. After that, we use XQuery for looping through list elements and our new fme xquery functions within it - to populate an HTML table:

-- prepare html including table header (delete comments if you would like to use this XQuery)

<html>
 <body>
  <table border="1">
   <tr><th>Rank</th><th>Transformer</th><th># of Transformers</th><th>%</th></tr>

-- make a loop from 0 up to the number of list elements-1 
   {for $i in (0 to xs:int(fme:get-attribute("_element_count"))-1)
     return 
      <tr><td>
-- add the line number (counter) 
       {$i+1} 
      </td><td>        
-- get transformer name 
       {fme:get-attribute(concat("list{", $i, "}._transformer_name"))}
      </td><td> 
-- get the number of transformer uses 
       {fme:get-attribute(concat("list{", string($i), "}._number_of_transformers"))}
      </td><td>      
-- get percentage of transformer use from total number of transformers 
       {fme:get-attribute(concat("list{", string($i), "}._percentage"))}
-- end of loop 
      </td></tr>}

-- finish the table and html 
  </table>
 </body>
</html>
 

The output is an attribute containing HTML code (click the link to see it).

Dmitri (mailto:dmitri.bagh@safe.com)

Attached Files
filesizedate
FLWOR.fmw36.9 kB11/26/08
If.fmw36.7 kB11/25/08
Join.fmw47.1 kB11/26/08
PointsToReverseGeocode.ffs<1kB11/26/08
TransformerSearch_xQuery.fmw65.3 kB10/21/09
XQUERYReverseGeocoding.fmw110.0 kB11/25/08
XQueryFMEFunctions.png128.3 kB10/21/09
address.xml<1kB11/26/08
address_full.xml4.3 kB11/19/08
beforeXQuery.png3.8 kB11/19/08
gallery.php------
index.php------
path.php------
report.htm24.3 kB10/21/09
simpleQueries.fmw27.6 kB11/21/08
sundata.kml<1kB12/02/08
sundata.xml<1kB12/02/08
xquery2html_kml.fmw50.7 kB12/02/08
xquery2kml.jpg61.4 kB12/02/08
xqueryDialog.png34.2 kB11/20/08
xqueryexpression.png20.6 kB11/20/08
User Comments Add a new comment