NCSA
emerge@ncsa.uiuc.edu

Tutorial: Building a Gazebo Client

The Gazebo Client Toolkit provides a set of classes and interfaces which represent an object-oriented interface to Gazebo's XML-based protocol. Clients, queries, search attributes, result sets, records, and error conditions are represented by classes which provide methods for building queries, receiving results, and retrieving record data.

Overview

The Gazebo gateway is a network service which enables its clients to search and retrieve data from multiple, distributed, heterogenous data sources. Administrator-defined abstract search attributes make this possible. The Gazebo configuration file defines these attributes and their mappings to search attributes for particular databases. Once a Gazebo client discovers the attributes that are defined by a Gazebo server, it can construct queries using those attributes and receive results which it can partially or wholly retrieve.

Simple search queries consist of terms, optionally with attributes. Complex queries combine subqueries with operators such as "and" and "or".

Search results are represented by result sets. A result set is an ordered collection of records. Each record contains data, potentially in several forms (a short form and a long form, for instance) and has a content-type (usually a MIME type).

The Gazebo Client Toolkit uses an asynchronous, lazy evaluation retrieval scheme. Instead of blocking while results are retrieved, a Gazebo client registers handlers for result sets and records prior to submitting search requests. Those handlers are invoked when results are returned. Result sets may only be partially retrieved initially, but accessors for as-yet-not-retrieved records can be used transparently along with accessors for records which have been retrieved already, and the client toolkit will make additional requests to Gazebo when necessary to retrieve them. The toolkit also performs rudimentary caching, again transparently. The effect is to minimize network traffic in the typical case that many records match, but the user only needs to peruse a small number of them to determine how to further refine their query.

Attributes

Gazebo search attributes describe search terms, either semantically or structurally. Typically they indicate what aspect of the data from a data source to match the search term against, or how to perform the matching. But in the broadest sense they are used to aribtrarily construct portions of a search query which is executed at the remote data source.

Each search attribute is a name/value pair where the name represents the the type or class of search attribute and the value identifies a member of the class. For instance the a search attribute with the type "field" and the value "title" might indicate to match the search term against the title field. A search attribute with the type "frequency" and the value "megahertz" might indicate that the search term is a frequency expressed in megahertz. And finally, an attribute with the name "structure" and the value "beginning" might indicate to perform the search with right truncation.

Attributes nest, and each path to a leaf node represents a combination of attributes which can be applied to a search term. Each data source in the Gazebo configuration has its own attribute tree, but when two data sources share part of an attribute tree, it indicates a semantic equivalence between the data sources. A Gazebo client can retrieve a merged attribute tree from the Gazebo server, and then search using that attribute tree as if the Gazebo server represented a single data source.

Attributes are represented by the class ncsa.gazebo.protocol.Attribute. Simple attributes can be constructed using a single name/value pair, as follows:


import ncsa.gazebo.protocol.Attribute;

...

Attribute myAttribute = new Attribute("field", "title");

Nested attribute combinations can be constructed by adding the sub-attribute as an additional argument to the above constructor, in the following manner:


import ncsa.gazebo.protocol.Attribute;

...

Attribute myAttributeCombo =
    new Attribute ("field", "title",
                   new Attribute("structure", "beginning"));

This can be repeated ad infinitum.

Queries

The class ncsa.gazebo.protocol.Query represents a search query. The simplest query consists only of a term:


import ncsa.gazebo.protocol.Query;

...

Query myQuery = Query.newInstance("fish");

Or a query can contain an attribute:


import ncsa.gazebo.protocol.Query;
import ncsa.gazebo.protocol.Attribute;

...

Query myQuery = Query.newInstance
    ("fish", new Attribute("field", "subject"));

Complex queries are constructed by combining subqueries with an operator:


import ncsa.gazebo.protocol.Query;
import ncsa.gazebo.protocol.Attribute;

...

Query q1 = Query.newInstance("fish",new Attribute("field","subject"));
Query q2 = Query.newInstance("duck");

Query myQuery = Query.newInstance(q1,"and",q2);

Search Requests

Requests are represented by the class ncsa.gazebo.protocol.Request. Search requests are represented by ncsa.gazebo.protocol.Search, which is a subclass of Request. Constructing a search request is trivial: simply pass a Query to Search's constructor:


import ncsa.gazebo.protocol.Query;
import ncsa.gazebo.protocol.Search;

Query myQuery = Query.newInstance("Smith",new Attribute("field","author"));
Search mySearch = new Search(myQuery);

Search requests have several other optional parameters. These include the number of records to (initially) retrieve (zero by default), and a set of data sources to search (by default, all data sources available through the Gazebo server). The number of records to retrieve is set with setCount and data sources are added with addDB:


...

mySearch.setCount(25);
mySearch.addDB("Library of Congress");
mySearch.addDB("Astronomy Digital Image Library");

Data source names are discovered by the client using a Meta request (more on that later). Although the default behavior of a Search request is to search all data sources, adding just one data source means to search only that one, and adding additional ones means to search them as well.

Results

The results of a search are represented by the class ncsa.gazebo.protocol.ResultSet. They're received asychronously by implementations of the ncsa.gazebo.ctk.ResultSetListener interface.

A result set is an ordered collection of records which matches a query. The number of records in a set can be retrieved with ResultSet.getHits, and the name of the data source can be retieved with ResultSet.getDBName:


import ncsa.gazebo.ctk.*;

...

public class MyResultSetListener implements ResultSetListener {
    public void resultSetReceived(ResultSetEvent rse) {
        ResultSet rs = rse.getResultSet();
        System.out.println(rs.getHits() + " hits on " + rs.getDBName());
    }
}

Records

A result set is an ordered collection of records. Record data can be accessed with ResultSet.fetchBrief and ResultSet.fetchFull, which take as arguments an ordinal index (1-based).

The two predefined record types, ResultSet.BRIEF and ResultSet.FULL, represent summary and complete record forms, respectively. Brief records are usually no longer than a title, and full records are often abstracts or entire documents. Record content types are represented as MIME types. The content type of any record can be determined by called ResultSet.getContentType. In the example below, the first record of a result set is passed to an HTML renderer, if it is an HTML document:


...

ResultSet rs;

...

String doc = rs.fetchFull(1);
if(rs.getContentType(rs).equals("text/html")) {
    myHTMLRenderer.render(doc);
}

Sessions

The class ncsa.gazebo.ctk.Session represents a client interaction with a Gazebo server. It provides mechanisms for connecting, disconnecting, submitting search requests, adding listeners, and delivering results to those listeners asynchronously.

To construct a Session, just pass the host and port of the Gazebo server to its constructor:


import ncsa.gazebo.ctk.*;

...

Session mySession = new Session("host.domain.edu",2323);

To add a ResultSetListener, call Session.addResultSetListener:


...

mySession.addResultSetListener(new ResultSetListener() {
    public void resultSetReceived(ResultSetEvent rse) {
        ResultSet rs = rse.getResultSet();
        System.out.println ("got results for " + rs.getDBName());
    }
});

To sumbit a request to Gazebo, pass it to Session.search:


...

Query myQuery = Query.newInstance("trichotillomania");

mySession.search(new Search(myQuery));

Putting it All Together

Below is a sample client which searches for an author's name on a Gazebo server containing bibliographic information, and reports the number of records found in each database. This rudimentary example assumes that the Gazebo server supports an attribute of type "field" with the value "Author".


import java.io.*;

import ncsa.gazebo.ctk.*;
import ncsa.gazebo.protocol.*;

public class SimpleClient {

    Session theSession;

    public void searchFor(String word, Attribute attr) {
        try {
          theSession.search(Query.newInstance(word,attr));
        } catch (Exception e) {
          e.printStackTrace();
        }
    }

    public SimpleClient (String host, int port) {
        try {
	    theSession = new Session(host, port);
        } catch (CTKException e) {
          e.printStackTrace();
        }

        theSession.addResultSetListener(new ResultSetListener() {
            public void resultSetReceived(ResultSetEvent rse) {
		ResultSet rs = rse.getResultSet();
                System.out.println(rs.getHits()+" hits from "+rs.getDBName());
            }
        });
    }

    public static void main (String args[]) {
        SimpleClient sc = new SimpleClient("ospsun1.nci.nih.gov",9270);

        // first arg is author's name to search for
        sc.searchFor(args[0], new Attribute("field","Author"));
    }
}

Status Messages

Gazebo sends back status messages as well as result sets. These messages can indicate error conditions and can also be used to provide feedback to the user about whether or not a data source is connected, the status of a search request, etc.

To catch status messages, implement ncsa.gazebo.ctk.ResponseListener and add a listener to a Session object with Session.addResponseListener. ncsa.gazebo.ctk.ResponseEvent objects containing ncsa.gazebo.protocl.Response are passed to the ResponseListener. Each Response object has a status code associated with it, which can be determined by calling Response.getStatus. A human-readable status message is available from Response.getStatusMessage. Status code constants are defined in Response. A convenience method, Response.isStatusGood, can be used to determine if the response indicates an error.

Note that result sets generate ResponseEvents as well as ResultSetEvents, so you will want to make sure to handle each ResultSet in only one place. This is typically not a problem in that the only information of interest about a ResponseEvent is its status code.

The following ResponseListener prints a message when an error occurs:


public class MyResponseListener implements ResponseListener {
    public void responseReceived(ResponseEvent rse) {
        Response rs = rse.getResponse();
	if(!rs.isStatusGood()) {
            String dbName = rs.getDB();
	    String message = rs.getStatusMessage();
            System.out.println(dbName+": error: "+message);
        }
    }
}