Overview

From Tupelo Wiki

Tupelo is a Semantic Content Repository framework. It provides means of storing, retrieving, annotating, and accessing information using Semantic Web technologies such as RDF (http://www.w3.org/RDF/), backed with standard storage technologies such as filesystems and databases, as well as RDF stores such as Sesame (http://www.openrdf.org/) and Mulgara (http://www.mulgara.org/).

RDF is a generic metadata framework capable of describing digital objects, real-world entities, and abstract concepts at multiple levels of specificity and granularity. Because of its use of global ID's (URI's) and named-link architecture for expressing both relationships and attributes, it provides a very simple means of assembling composite descriptions from independently-generated parts, a common issue in distributed data management.

However RDF API's and technologies have tended to be monolithic, closely tying API's and query languages to specific storage architectures. Tupelo solves this problem by providing a common abstract subsuming RDF descriptions, storage, retrieval, and basic stream operations, sort of like a JDBC for the semantic web (similar to Trippi (http://trippi.sourceforge.net/)). In addition to providing means of interacting with metadata, Tupelo also provides a uniform means of storing, annotating, and accessing streams in a variety of heterogeneous storage architectures.

For example, Tupelo can act as a WebDAV client, extract RDF metadata from WebDAV resources, and copy them into a Sesame repository with a small set of generic operations. Tupelo can also aggregate multiple heterogeneous stores, so that a filesystem could hold stream information, but stream annotations could be stored in Mulgara.

Contexts and Operators

The core of Tupelo's abstraction is the Context. Contexts represent logical partitions of the global data/metadata space. Once a Context is configured, it can perform any of a number of Operators. Operators do things like search for RDF patterns or write data to streams. If a Context cannot perform an operator it may pass the operator to a delegate Context, and networks of Contexts can be constructed to perform aggregate tasks such as mirroring and failover.

For example, the following code constructs a Context that maps part of the space of all possible URI's to a directory in the local filesystem:

Context fileContext = new SimpleFileContext();
fileContext.setUriPrefix("http://some.uri.prefix/");
fileContext.setPathPrefix("/opt/somewhere/");

Now data can be read and written using "blob" operators:

BlobWriter bw = new BlobWriter();
bw.setUri(URI.create("http://some.uri.prefix/aBlob.txt"));
bw.setInputStream(... some input stream ...);
fileContext.perform(bw);

... which causes the data in the input stream to be written to /opt/somewhere/aBlob.txt.

If I want to annotate this stream of data with some attributes, I can use a Context that supports RDF operators:

Context metadataContext = SesameContextFactory.newHttpContext(... parameters ...);
TripleWriter tw = new TripleWriter();
tw.add(Triple.create(Resource.uriRef("http://some.uri.prefix/aBlob.txt"),Vocabulary.Dc.TITLE,"my blob"));
metadataContext.perform(tw);

These are rather unremarkable Context and Operator implementations. But consider the following example, in which a WebDAV server is used to store both the data and metadata:

Context webdavContext = new WebdavContext();
BlobWriter bw = new BlobWriter();
bw.setUri(URI.create("http://my.server.edu/dav/aBlob.txt"));
webdavContext.perform(bw);
TripleWriter tw = new TripleWriter();
tw.add(Triple.create(Resource.uriRef("http://my.server.edu/dav/aBlob.txt"),Vocabulary.DC.TITLE,"my blob"));
webdavContext.perform(tw);

In this case, Tupelo is providing a sufficient abstraction capability that the filesystem, triple store, and WebDAV server are all accessible using the same set of generic operators. The only difference is that the first example requires two contexts, and the second only needs one. But Tupelo solves that problem too. We can construct a Context that provides both the blob operations of SimpleFileContext and the metadata operations of Sesame using a generic UnionContext:

Context sesameFileContext = new UnionContext();
sesameFileContext.addChild(fileContext);
sesameFileContext.addChild(metadataContext);

This union context behaves just like the webdav context in the example above:

BlobWriter bw = new BlobWriter();
bw.setUri(URI.create("http://my.server.edu/dav/aBlob.txt"));
sesameFileContext.perform(bw);
TripleWriter tw = new TripleWriter();
tw.add(Triple.create(Resource.uriRef("http://my.server.edu/dav/aBlob.txt"),Vocabulary.DC.TITLE,"my blob"));
sesameFileContext.perform(tw);

By combining heterogeneous Context implementations, data and metadata can be stored and managed in a variety of ways without affecting the application code that creates and uses the data and metadata.

For detailed information about the Tupelo API, see the API Reference.