NCSA
emerge@ncsa.uiuc.edu

Using the PubMed Target

As a demonstration of the target API, gazelle provides a Z39.50 interface to the Pubmed database at the National Library of Medicine.

Requirements

Running the gazelle to search the PubMed database requires

  1. Building gazelle with the pubmed target
  2. Running a Perl script to process the information retrieved from the Pubmed website
To build gazelle with the pubmed target, simply go to the top level of the gazelle distribution and type
% make pubmed
The Perl script, pm_gateway.pl is located in the targets/pubmed/ subdirectory of the gazelle source tree. The script itself requires the following Perl modules in order to run:

Checking for required packages

The Socket and FileHandle modules normally come as part of a typical Perl installation. LWP and HTTP may need to be installed first. To check if a module is available for your system, at the prompt, type the command:

# perl -MCPAN -e shell

This will start an interactive shell which allows you to query, download, and update Perl packages. To see if you have the required libraries, type the commands:

cpan> m LWP::UserAgent
cpan> m HTTP::Request

at the cpan> prompt. If those packages aren't installed, you'll get a message indicating such. To install those packages, simply issue the command(s):

# install LWP::UserAgent
# install HTTP::Request

See your Perl documentation for more detail on the -MCPAN command line option.

Searching Pubmed

As mentioned above, presenting records from the Pubmed database requires the use of an external Perl script to be running so that information can be processed correctly. By default, the pm_gateway.pl script runs on port 3737 of the host machine. To change this, start the script with a port number on the command line like:

# ./pm_gateway.pl 7373

Gazelle needs to be notified of where to send records retrieved from Pubmed for processing. This is done in the gazelle configuration file. You set two param tags at the config tag level to specify the cgiHost and cgiPort where the script is running. For example, to run connect to the script which is running on foo.ncsa.uiuc.edu at the default port of 3737, your configuration file will look like:

	<config>
		<param name="cgiHost" value="foo.ncsa.uiuc.edu"/>
		<param name="cgiPort" value="3737"/>

			. . .

	</config>

Searching Fields

Pubmed supports searching over the following attributes:

Name Meaning
AFFL Affilliation
ALL All Fields
AUTH Author Name
ECNO E. C. Number
JOUR Journal Name
MESH MeSH (Medical Subject Headings) Term
MAJR MeSH Major Topic
PAGE First Page
PDAT Publication/Creation Date
PTYP Publication Type
KYWD Keywords
WORD Text Word
TITL Title Word
VOL Volume

Queries are combined from simple, one-term/attribute queries into complex queries by combining the simple queries with AND or OR and using parentheses to denote precedence. An attribute to a search term is appended to that term within square brackets "[]". For instance, for the query "Cancer AND Mead", where Cancer is a keyword and Mead is the author, gazelle would construct the query:

(Cancer[KYWD] AND Mead[AUTH])

This is done by specifying, in the configuration file, templates for a "keyword" term, an "author" term, and an "and_expression". Namely:

	<config>

		. . .

		<database name="PubMed">
			<attr type="1" value="1003">	<!-- author -->
				<term>$value[AUTH]</term>
			</attr>
			<attr type="1" value="21">	<!-- subject -->
				<term>$value[KYWD]</term>
			</attr>
			<template name="and_expression">
				($lhs AND $rhs) 
			</template>

			. . .

		</database>

		. . .

	</config>

The above example also demonstrates the concept of "mapping" BIB-1 attributes to Pubmed fields. BIB-1 is a set of attributes which Z39.50 can use for searching (there are other attribute sets. Z39.50 does not specify which one it uses). BIB-1's Use attribute number 1003 corresponds to a search on an `author' attribute while the number 21 corresponds to a search on a `subject' attribute. Note that there is not always a one-to-one mapping of BIB-1's attributes onto Pubmed's attribute set. In such cases, an approximation should suffice (i.e., `subject' is close to `keyword' in meaning).