Tupelo 2 Cookbook
From Tupelo Wiki
The (currently unfinished) Tupelo 2 Cookbook is a set of examples showing how to use different parts of the Tupelo 2 API.
| Table of contents |
Comma-separated values (CSV)
Note: this cookbook entry is untested.
Comma-separated value (CSV) format is a de facto standard format for exchanging tabular data. Tupelo 2 provides support for CSV data based on the format specification in RFC 4180 (http://tools.ietf.org/html/4180).
Before converting tabular data into RDF triples, you must decide how to distribute the columns of each row across a set of triple patterns. Then, you must provide a mapping between the data in each column and the RDF URI reference or literal that will represent that data in your triple patterns. In many cases, these two requirements can be met using PatternProjector (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/rdf/query/PatternProjector.html) and TableProjector (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/rdf/query/TableProjector.html).
Suppose you have the following CSV data representing weather observations:
time,temp_f,barometer,wind_dir,wind_speed 2006-12-07T17:23:00.000Z,16,1037,NW,23 2006-12-07T16:53:00.000Z,14,1036,NW,24 2006-12-07T15:53:00.000Z,14,1035,N,24
And we want the resulting RDF to look like this for each column:
_:a <rdf:type> <ns:temperatureObservation> . _:a <ns:dateTime> "2006-12-07T17:23:00.000Z" . _:a <ns:temperature> "16" . _:b <rdf:type> <ns:barometerObservation> . _:b <ns:dateTime> "2006-12-07T17:23:00.000Z" . _:b <ns:barometricPressure> "1037" . _:c <rdf:type> <ns:windObservation> . _:c <ns:dateTime> "2006-12-07T17:23:00.000Z" . _:c <ns:windDirection> "NW" . _:c <ns:windSpeed> "23" .
We can establish a mapping to this set of triple patterns using a PatternProjector as shown:
import static org.tupeloproject.rdf.UriRefFactory.rdf;
...
protected UriRef ns(String suffix) {
return Resource.uriRef("http://my.namespace.uri/etc/blah/"+suffix);
}
...
PatternProjector wp = new PatternProjector();
wp.addPattern("a",rdf("type"),ns("temperatureObservation"));
wp.addPattern("a",ns("dateTime"),"time");
wp.addPattern("a",ns("temperature"),"temp_f");
wp.addPattern("b",rdf("type"),ns("barometerObservation"));
wp.addPattern("b",ns("dateTime"),"time");
wp.addPattern("b",ns("barometricPressure"),"barometer");
wp.addPattern("c",rdf("type"),ns("windObservation");
wp.addPattern("c",ns("dateTime"),"time");
wp.addPattern("c",ns("windDirection"),"wind_dir");
wp.addPattern("c",ns("windSpeed"),"wind_speed");
We'll use this projector for each row in the table. The projector will attempt to bind each variable in each pattern to a column in the table (by matching the name of the variable to a column name) and any unbound variables that remain will be assigned a new globally unique URI reference. For example projecting the first row against the second pattern would generate a triple like this:
<http://some/uri/prefix/58476> <ns:dateTime> "2006-12-07T17:23:00.000Z"
Any use of an unbound variable with the same name in a subsequent pattern per row will result in the same URI reference being used. For instance given the triple above, projecting the first row against the third pattern would generate this triple:
<http://some/uri/prefix/58476> <ns:temperature> "16"
By default, PatternProjector will convert the CSV data in each column into RDF literals. If this is not desired (for instance, if a CSV column contains an identifier that you want to convert to a URI reference) you can set a ObjectVisitor (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/rdf/ObjectVisitor.html) per-column with custom behavior. For instance if I wanted to convert a wind direction from the string "NW" to a URI reference "http://my.namespace.uri/etc/blah/winddir/NW" I could configure the PatternProjector as follows:
wp.setObjectVisitor("wind_dir", new ObjectVisitor() {
public Resource visit(Object o) {
return Resource.uriRef("http://my.namespace.uri/etc/blah/winddir/"+o);
}
});
Once a PatternProjector has been configured, you can use it to convert CSV data to triples by configuring a CsvTable (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/util/CsvTable.html) appropriately and calling TableProjector.project (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/rdf/query/TableProjector.html#project(org.tupeloproject.util.Table)):
CsvTable csvData = new CsvTable(new FileReader("~/weather.csv"));
TableProjector tp = new TableProjector();
tp.setRowProjector(wp);
Set<Triple> triples = tp.project(csvData);
Contexts and Operators
Tupelo's core abstraction is called Context (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/Context.html). A Context acts as a bridge between an application and an implementation via a set of generic Operators (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/Operator.html). Each operator is a description of an action, such as a query or modification, that can be performed on a Context. Performing an operation may result in the operator's state being modified, for instance by setting a property on the operator representing query results.
For instance, the TripleWriter (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/TripleWriter.html) operator, supported by Contexts representing RDF triple stores, allows the caller to add and remove RDF triples from the store represented by the Context:
Context someRdfDatabase = ...
TripleWriter writer = new TripleWriter();
writer.add(Triple.create
(URI.create("http://tupeloproject.org"),
URI.create(Namespaces.dc("description")),
"Tupelo project website"));
someRdfDatabase.perform(writer);
Unifier
Unifier (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/Unifier.html) is an Operator (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/Operator.html) that makes it possible to execute complex, declarative, conjunctive queries against any Context implementation that supports TripleMatcher (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/TripleMatcher.html).
Conjunctive queries in practice
Declarative, conjuctive queries are useful for extracting tabular information from complex RDF graphs. For instance, suppose I have the following RDF data (in this example "ns:" is shorthand for a namespace prefix, let's say "http://ns#"):
<ns:person1> <ns:hasDog> <ns:dog1> . <ns:person2> <ns:hasDog> <ns:dog2> . <ns:person3> <ns:hasDog> <ns:dog3> . <ns:person4> <ns:hasDog> <ns:dog3> . <ns:person5> <ns:hasDog> <ns:dog4> . <ns:person1> <ns:hasPhoneNumber> "123-4567" . <ns:person2> <ns:hasPhoneNumber> "234-5678" . <ns:person3> <ns:hasPhoneNumber> "345-6789" . <ns:person4> <ns:hasPhoneNumber> "456-7890" . <ns:person5> <ns:hasPhoneNumber> "567-8901" . <ns:dog1> <ns:hasName> "fido" . <ns:dog2> <ns:hasName> "rover" . <ns:dog3> <ns:hasName> "spot" . <ns:dog4> <ns:hasName> "spot" .
This data describes people and their dogs. Note that person3 and person4 share dog3, and that dog3 and dog4 are both named "spot".
Suppose I want to find out who has a dog named "spot", and what their phone numbers are. This is not possible with Tupelo 2's TripleMatcher (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/TripleMatcher.html) operator, but it is possible to express in a declarative query language such as SPARQL (http://www.w3.org/TR/rdf-sparql-query/):
SELECT ?person ?phoneNumber
WHERE {
{ ?person <ns:hasDog> ?dog . }
{ ?dog <ns:hasName> "spot" . }
{ ?person <ns:hasPhoneNumber> ?phoneNumber . }
}
Constructing a unifier query
Unifier provides an API to construct and execute conjunctive queries. For instance, the query shown above would be constructed in the following manner. First, for each node we want to match against a known value or identifier, we construct the appropriate UriRef (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/rdf/UriRef.html) representing the value or identifier:
UriRef hasDog = Resource.uriRef("http://ns#hasDog");
UriRef hasName = Resource.uriRef("http://ns#hasName");
UriRef hasPhoneNumber = Resource.uriRef("http://ns#hasPhoneNumber");
Literal spot = Resource.literal("spot");
Now we create a Unifier (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/Unifier.html). We then configure the Unifier with a list of column names, which corresponds to the SELECT clause in SPARQL. Because these correspond to variable names in the query patterns, we can call them whatever we want:
Unifier unifier = new Unifier();
unifier.setColumnNames("person", "phoneNumber");
To construct the rest of the query, we add patterns to the Unifier with the addPattern (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/kernel/Unifier.html#addPattern(java.lang.Object,%20java.lang.Object,%20java.lang.Object)) method:
unifier.addPattern("person", hasDog, "dog");
unifier.addPattern("dog", hasName, spot);
unifier.addPattern("person", hasPhoneNumber, "phoneNumber");
Each of the three terms in the addPattern method is either a String, which is taken to be the name of a variable; or a Resource, which is taken as a value to match against.
Processing query results
The execution of a Unifier query produces a set of results; each result is a set of variable bindings that taken together match all the patterns. Unifier returns a Table (http://dlt-dev.ncsa.uiuc.edu/javadoc/t2/org/tupeloproject/util/Table.html) of Resource where each row represents a result and each column represents the column variable's value for that result.
Context someContext = ... someContext.perform(unifier); Table<Resource> results = unifier.getResult();
Now we can process the results. The following code
for(Tuple<Resource> row : results) {
System.out.println(row);
}
will produce the following output:
[http://ns#person4, 456-7890] [http://ns#person5, 567-8901] [http://ns#person3, 345-6789]
Note that depending on the Context implementation, the results might be in a different order. A Table can be sorted by putting its rows into a SortedSet (http://java.sun.com/j2se/1.5.0/docs/api/java/util/SortedSet.html). Tuples sort by terms, in order. For instance, we could print the rows from the previous example in order like this:
TreeSet<Tuple<Resource>> sortedResults = new TreeSet<Tuple<Resource>>();
for(Tuple<Resource> row : results) {
sortedResults.add(row);
}
for(Tuple<Resource> row : sortedResults) {
System.out.println(row);
}
Sessions (2.1)
Contexts provide the core abstractions and operations of the Tupelo kernel, but many applications need to deal with higher-level concepts such as domain objects (e.g., people, documents, images), attribute-like metadata, and binary data. Tupelo provides a set of "session" API's that enable applications to create objects, set properties, and read and write data. Each session is backed by a context that acts as a persistence layer.
In Tupelo's session API, each domain entity is represented by a Thing (so-called because the top-level class in OWL is owl:Thing). Things can have any number of properties, each of which is identified with a UriRef. In this example, a ThingSession created and used to create a Thing, set a property value, and make the change persistent:
import static org.tupeloproject.rdf.Vocabulary.Dc; ... Context c = ... ThingSession ts = new ThingSession(c); Thing myDocument = ts.newThing(); myDocument.setValue(Vocabulary.Dc.TITLE, "This is the title of my document"); ts.save();
In this example, the ThingSession creates a subject UriRef for myDocument. You can also set the Thing's subject by passing it to newThing:
myDocument = ts.newThing(Resource.uriRef("http://foo.bar#someUriRef"));
myDocument.setValue(Vocabulary.Dc.TITLE, "This is the title of some other document");
ts.save();
You can also fetch a Thing representing any RDF subject, but only if it has been saved:
myDocument = ts.fetchThing(Resource.uriRef("http://foo.bar#someUriRef"));
Things have optional types and labels, which correspond to the rdf:type and rdfs:label properties. This example creates a foaf:Person with the label "Joe":
Thing joe = ts.newThing();
joe.addType(Vocabulary.Foaf.PERSON);
joe.addLabel("Joe");
ts.save();
A Thing can be the value of a property of another Thing. Continuing the previous example:
Thing tex = ts.newThing();
tex.addType(Vocabulary.Foaf.PERSON);
tex.addLabel("Tex");
tex.addValue(Vocabulary.Foaf.KNOWS, joe);
ts.save();
Properties are not typed, but their values are, and Thing provides a set of typesafe property value accessors:
final Resource TEMP = Resource.uriRef("urn:temperature");
Thing reading = ts.newThing();
reading.setValue(TEMP, 34.5);
reading.setValue(Vocabulary.Dc.DATE, new Date());
reading.setValue(Vocabulary.Dc.CREATOR, joe);
double v = reading.getDouble(TEMP);
Thing readingCreator = reading.getThing(Vocabulary.Dc.CREATOR);
ts.save();
The types supported are int, long, float, double, String, Date, URI, URL, Resource, and Thing.
Multi-valued properties can be either unordered (the default) or ordered. Setting the values of a property to a Collection makes the property ordered or unordered depending on whether the collection implements Set or List:
Resource DANCE_CARD = Resource.uriRef("urn:danceCard");
Thing gertrude = ts.newThing();
Set<Thing> danceCard = new HashSet<Thing>();
danceCard.add(joe);
danceCard.add(tex);
gertrude.setValues(DANCE_CARD, danceCard);
Set<Thing> gsDanceCard = (Set<Thing>) gertrude.getThings(DANCE_CARD);
Both ordered and unordered properties can have values added and removed from them:
gertrude.removeValue(DANCE_CARD,joe);
Things can be deleted. Like other modifications, deleting a Thing removes it from the session state, and the deletion is not persisted to the Context until the session is saved.
ts.delete(tex); ts.save();
Because of RDF's open world assumption, there's no way to tell if a Thing "exists" or not; instead, deletion just removes all statements with the Thing's subject from the Context, so a Thing representing it will not have any properties or property values.
In addition to storing and retrieving Things, ThingSession provides ways of searching for Things based on the values of their properties:
Set<Thing> readings = ts.getThings(TEMP, 34.5);
for(Thing aReading : readings) {
Date time = aReading.getDate(Vocabulary.Dc.DATE);
Thing person = aReading.getThing(Vocabulary.Dc.CREATOR);
System.out.println("A temperature reading of 34.5 was taken at "+time+" by "+person.getLabel());
}
Things are portable across Contexts using the register/deregister API. To copy a Thing from one Context to another, set up a ThingSession for each Context, fetch a Thing from the first one, deregister it from the first one, register it with the second one, and save. Deregistering a Thing from a session does not delete or alter the Thing in that session's session state.
Context c2 = ... ThingSession ts2 = new ThingSession(c2); ts.deregister(gertrude); ts2.register(gertrude); ts2.save();
A Bean Session Introduction
The old bean session introduction is here
