Understanding the XMLQuery

Apache OODT's profile servers, product servers, and other components all use the same format for a query. It's encapsulated by the class org.apache.oodt.xmlquery.XMLQuery. In this tutorial, we'll look at this class and see how it represents queries. You'll need this knowledge both to make queries to OODT servers, as well as to understand queries coming into OODT servers.

Basic Query Concepts

Capturing various aspects of a query is difficult to do in general, and OODT's implementation is not stellar or complete. But, it has proved succesful in a variety of applications, so let's see what concepts it encapsulates.

XML?

First, forget the fact that the XMLQuery has "XML" in its name. It doesn't mean you can query only XML resources. It's called XMLQuery probably because the person who came up with it thought XML was pretty cool, or that you can represent an OODT query in XML format.

While you can represent an XMLQuery in XML, you usually only use the Java representation, that is, you create and manipulate Java objects of the class org.apache.oodt.xmlquery.XMLQuery.

Generic Queries

In theory, the XMLQuery can represent any query for information. It captures generic aspects of a query, such as the domain of the question being posed, the range in which the desired response should be formulated, and constraints on what selects the response. In XMLQuery parlance, we call these the "from element set" (domain), the "select element set" (range), and the "where element set" (constraints).

In practice, none of the current OODT implementations use any but the "where element set." And indeed, for most problems presented to OODT, that is sufficient. However, the framework is there to support more aspects of a query, and you're welcome to use them in your own deployments.

Query Metadata

The XMLQuery concept captures metadata about a query as well, such as the title for the query, whether the query itself is secret or classified, how many results to return at most, how to propagate the query through a network, and so forth. In practice, though, none of these additional attributes are used in current deployments of OODT. Moreover, none of the current OODT components obey such settings such as maximum number of results or propagation types.

As a result, you should ignore these aspects of the XMLQuery and merely use its default values. We'll see these shortly.

XMLQuery Structure

The following diagram shows the XMLQuery and related classes (note the diagram is outdated; "jpl.eda.xmlquery" should read "org.apache.oodt.xmlquery"):

Class diagram of XMLQuery

A single XMLQuery object has three separate lists of QueryElement objects, representing the "from", "select", and "where" element sets. In practice, the "from" and "where" sets are empty, though, as mentioned. There's also a single QueryHeader object capturing query metadata. Within the XMLQuery itself is additional query metadata. Finally, there's exactly one QueryResult object which captures the results of the query so far.

Boolean Expressions

The XMLQuery class uses lists of QueryElement objects to represent its "from", "select", and "where" element sets. The lists form a postfix boolean stack, with the zeroth element of the list being the top of the stack. Although you can populate these stacks by manipulating their corresponding java.util.Lists, the XMLQuery class provides a boolean expression language that lets you directly populate them.

The XMLQuery class also respects that some queries just cannot be formulated as a boolean expression. In these cases, you can pass in a string that the XMLQuery will otherwise carry unparsed. Note that your profile and product servers will then have the responsibility of handling that string in some appropriate way.

Query Language

The query language that XMLQuery uses to generate postfix boolean stacks is a series of infix, not postfix, element-and-value expression linked by boolean operators. Here's an example:

temperature > 36 AND latitude < 45

As you can see, these are triples linked in a logical expression. Each triple has the form (element, relation, literal). For example, the first triple has element = temperature, relation = GT (greater-than), and literal = 36. That triple is linked to the next one with the boolean AND operator.

The full set relation operators include: = (EQ), != (NE), < (LT), <= (LE), > (GT), >= (GE), LIKE, and NOTLIKE. The logical operators include AND, &, OR, |, NOT, and !. You can use parenthesis to group things too.

Here are a few more examples:

specimen = Blood
bac > 0.05 AND priors = 3
surname LIKE 'Simspon%' OR numChildren <= 3 AND RETURN = numEpisodes

Expression Stacks

The "where" element set is actually a java.util.List of org.apache.oodt.xmlquery.QueryElement objects, arranged in a boolean stack with the top of the stack as the zeroth element in the list. QueryElement objects themselves have two attributes, a role and a value.

The role tells what role the QueryElement is playing. It can be elemName for the element part of a triple, RELOP for the relation part of a triple, LITERAL for the literal part of a triple, or LOGOP for a logical operator linking triples together. The value tells what the element is, what the relational operator is, what literal value is being related, or what the logical operator is.

The XMLQuery parses a query expression and generates a corresponding stack of QueryElements. Let's look at a couple examples. The expression

latitude > 45

generates the "where" stack

Stack of three query elements

While the expression

artist = Bach AND NOT album = Poem OR track != Aria

generates the "where" stack

Stack of a lot of query elements

The RETURN Element

A special element is reserved by XMLQuery: RETURN. It's used to indicate what to select, and so any value specified with RETURN goes into the "select" set, not the "where" set.

Moreover, the RETURN element doesn't pay attention to how it's linked with boolean expressions in the rest of query, or what relational operator is used with the literal value being returned. For example, that means all of the following expressions would generate identical XMLQueries:

specimen = Blood AND RETURN = volume
specimen = Blood OR RETURN = volume
specimen = Blood AND RETURN != volume
specimen = Blood AND RETURN < volume
specimen = Blood AND RETURN LIKE volume

All QueryElements from RETURN triples would go into the "select" instead of the "where" set.

Constructing a Query

To construct a query, you'll use a Java constructor of the following form:

XMLQuery(String keywordQuery, String id, String title,
  String desc, String ddId, String resultModeId, String propType,
  String propLevels, int maxResults, java.util.List mimeAccept,
  boolean parseQuery)

The parameters are summarized below:

Parameter Purpose Sample values
keywordQuery A string representing your query expression, in the query language described above, or in some other application-sepcific language. numDonuts = 3, select volume_remaining from specimens where specimen_type = 4
id An identifier for your query query-1, 1.3.6.1.1316.4.1, myQuery, urn:ibm:sys:0x39ad930a
title A title for your query My First Query, Query for Blood Specimens, Simpson's Query
desc Description of the query H.J. Simpson is looking for donut shops
ddId Data dictionary ID. This identifies the data dictionary that provides definitions for the elements used in the query like "specimen" or "numDonuts". It's not used by any current OODT deployment or the OODT framework. null
resultModeId Identifies what to return from the query. Defaults to ATTRIBUTE. Not used by any current OODT deployment or the OODT framework. null
propType How to propagate the query, defaults to BROADCAST. It's not used by any current OODT deployment or the OODT framework. null
propLevels How far to propagate the query, defaults to N/A. Not used by any current OODT deployment or the OODT framework. null
maxResults At most how many results to return; not enforced by OODT framework. 1, 100, Integer.MAX_VALUE, -6
mimeAccept List of acceptable MIME types for returned products, defaults to */* List types = new ArrayList(); types.add("text/xml"); types.add("text/html"); types.add("text/*");
parseQuery Should the class parse the query as a boolean expression? True says to generate the boolean expression stacks. False says to just save the expression string. true, false

All of the values above can be set to null to use a default or non-specific value (except for maxResults and parseQuery, which are int and boolean types and can't be assigned null). For most applications, using null is perfectly acceptable. Since the OODT framework doesn't use maxResults, you can use any value. However, specific profile servers' and product servers' query handlers may pay attention to value if so programmed.

Parsed or Unparsed Queries

The last parameter, parseQuery, tells if you want the XMLQuery class to parse your query and generate boolean expression stacks (discussed above) or not. Set to true, the class will parse the string as if in the XMLQuery language described above, and will generate the "from", "select", and "where" element boolean stacks. Set it to false and the class won't parse the string or generate the stacks. It will instead store the string for later use by a profile server's or product server's query handler.

For example, if you pass in the XML query language expression,

donutsEaten > 5 AND RETURN = episodeNumber

then set the parseQuery flag to true. As another example, suppose the query expression is

select episodeNumber from episodes where donutsEaten > 5

This is an SQL expression, probably targeted to a product server than can handle SQL expressions. In this case, set parseQuery to false.

The current OODT deployments for the Planetary Data System and the Early Detection Research Network both use parsed queries.

Acceptable MIME Types

Internet standards for mail, web, and other applications use MIME types (described in RFC-2046 amongst other documents) to describe the content and media type of data. So does OODT. When you construct an XMLQuery, you can also pass in a list of MIME types that are acceptable to you for the format of any returned products, much in the same way your web browser tells a web server what media types it can display.

The list of acceptable MIME types is only used for product queries since products can come in any shape and flavor. Profile queries ignore the list; profiles are always returned as a list of Java org.apache.oodt.profile.Profile objects.

You've probably seen MIME types before, but here are some examples in case you haven't:

  • text/plain - a plain old text file
  • text/html - a hypertext document
  • image/jpeg - a picture in the JPEG/JFIF format
  • image/gif - a picture in the GIF format
  • audio/mpeg - an audio file, probably in the MP3 format
  • video/mpeg - a video file, probably in the MP2 format
  • application/msword - a Micro$oft Word document
  • application/octet-stream - binary data

In the XMLQuery constructor, you can pass in a list of MIME types that shows your preference for returned products. Product servers' query handlers examine the query to see if they can provide a matching product, and they examine the list of MIME types to see if they can provide matching products in the format you desire.

As an example, suppose you create a MIME type list as follows:

List acceptableTypes = new ArrayList();
acceptableTypes.add("image/tiff");
acceptableTypes.add("image/png");
acceptableTypes.add("image/jpeg");

and you pass acceptableTypes as the mimeAccept parameter of the XMLQuery constructor. This tells query handlers receiving your query that you'd really prefer a TIFF format image. However, failing that, you'll accept a PNG format image. And, as a last resort, a JPEG will do.

You can also use wildcards in your MIME types. Suppose we did the following:

List acceptableTypes = new ArrayList();
acceptableTypes.add("image/tiff");
acceptableTypes.add("image/png");
acceptableTypes.add("image/*");

Now we tell query handlers in product servers that we really prefer TIFF format images. If a query handler can't do that, then a PNG format will be OK. And if a query handler can't do PNG, then any image format will be fine, even loathesome GIF.

If you pass a null or an empty list in the mimeAccept parameter, the OODT framework will convert into a single item list: */*, meaning any format is acceptable.

"Running" XMLQuery

The XMLQuery class is also an executable class. By running it from the command-line, you can see how it generates its XML representation. It also lets you pass in a file containing an XML representation of an XMLQuery and parses it for validity.

Let's try just seeing that XML representation. (In these examples, we'll be using a Unix csh like command environment. Other shells and non-Unix users will have to adjust.)

Collecting the Components

First up, we'll need two components:

  • OODT Common Components. This is needed by all of OODT software; it contains general utilities for starting servers, parsing XML, logging, and more.
  • OODT Query Expression. This contains the XMLQuery and related classes.

Download the binary distribution of each of these packages and extract their contents. Then, create a single directory and collect the jar files together in one place.

Generating the Query

To generate the query, pass the command-line argument -expr. That tells the XMLQuery that the rest of the command line is the query expression. It will expect it to be in the XMLQuery query language (meaning that it will create an XMLQuery object with parseQuery set to true).

Here's an example:

% java -Djava.ext.dirs=. \
  org.apache.oodt.xmlquery.XMLQuery \
  -expr donutsEaten \> 5 AND RETURN = episodeNumber
kwdQueryString: donutsEaten > 5 AND RETURN = episodeNumber
fromElementSet: []
results: org.apache.oodt.xmlquery.QueryResult[list=[]]
whereElementSet:
[org.apache.oodt.xmlquery.QueryElement[role=elemName,value=donutsEaten],
org.apache.oodt.xmlquery.QueryElement[role=LITERAL,value=5],
org.apache.oodt.xmlquery.QueryElement[role=RELOP,value=GT]]
selectElementSet:
[org.apache.oodt.xmlquery.QueryElement[role=elemName,value=episodeNumber]]
======doc string=======
<?xml version="1.0" encoding="UTF-8"?>
<query> . . .

The program prints out some fields of the XMLQuery such as the "from" element set, the current results (which should always be empty since we haven't passed this query to any product servers), the "where" element set, and the "select" element set. It then prints out the XML representation.

If you examine the XML representation closely, you'll see things like the list of acceptable MIME types:

<queryMimeAccept>*/*</queryMimeAccept>

This says that any type is acceptable. You'll also see the passed in query string:

<queryKWQString>donutsEaten &gt; 5 AND
  RETURN = episodeNumber</queryKWQString>

Regardless of whether you passed true or false in the parseQuery parameter, the XMLQuery always saves the original query string. For unparsed queries, this is how the string is packaged on its way to a product server. For parsed queries, product servers will use the boolean stacks. (Since this was a parsed query, you'll also see the boolean stacks in XML format if you look closely. They're there.)

Getting Results

Alert readers will have noticed that the results of a query have a place in XMLQuery objects. This actually applies to product queries only. After sending an XMLQuery to a product server, the query object comes back adorned with zero or more matching results. You then access the XMLquery object methods to retrieve those results.

The following class diagram demonstrates the relationship (again, the diagram is outdated; "jpl.eda.xmquery" should read "org.apache.oodt.xmlquery"):

Result class diagram

As you can see, a single query has a single org.apache.oodt.xmlquery.QueryResult, which contains a java.util.List of org.apache.oodt.xmlquery.Result objects. Result objects may have zero or more Headers, and Result objects may actually be LargeResult objects.

To retrieve the list of Result objects, call the XMLQuery's getResults method, which returns the java.util.List directly.

Each result also includes

  • An identifier. In the case there's more than one matching results, this identifier (a string) should be unique amongst results.
  • A MIME type. This tells you what format the matching product is in.
  • A profile ID. This is currently unused.
  • A resource ID. This is also unused.
  • A validity period. This is the number of milliseconds for which the product is considered valid. You can use this information to decide how long to cache the product within your own program before having to retrieve it again.
  • A flag indicating whether the product is classified. Classified or secret products shouldn't be cached or should otherwise be handled carefully by your application program.

Result Headers

The headers of a result are optional. They're used for tabular style results to indicate column headings. Each Header object captures three strings, a name, a data type, and units.

For example, suppose you retrieved a product that was a table of temperatures at various locations on the Earth. There might be three headers in the headers list:

List Index Header
Name Data Type Units
0 latitude float degrees
1 longitude float degrees
2 temperatuer float kelvins

Suppose the product you get back as a picture of a tissue specimen. In this case, there would be no headers.

Getting the Product Data

To retrieve the actual data comprising your product, call the Result object's getInputStream method. This returns a standard java.io.InputStream that lets you access the data. How you interpret that data, though, depends on the MIME type of the product, which you can get by calling the Result's getMIMEType method.

For example, if the MIME type was text/plain, then the byte stream would be a sequence of Unicode characters. If it were image/jpeg, then the bytes would be image data in JPEG/JFIF format.

Conclusion

In this tutorial, we learned about the structure of the standard query component in OODT, the XMLQuery. We saw the query language that XMLQuery supports and how it generates postfix boolean expression stacks. You can also encode any query expression by using a special constructor argument that tells XMLQuery to not parse the query string. We also execute the XMLQuery class directly. Finally, we saw how product data is embedded in the XMLQuery and how to deal with such results.

As a client of the OODT framework, you can now create XMLQuery objects to query product servers from within your Java applications. As a server in the framework, you know how to deal with incoming query objects.