[Mulgara-general] graph types

Tue Sep 1 19:13:05 UTC 2009

On Sat, Aug 29, 2009 at 12:52 PM, Gregg Reynolds <dev at mobileink.com> wrote:
>
> On Fri, Aug 28, 2009 at 10:56 PM, Paul Gearon <gearon at ieee.org> wrote:
>>
>> Accessing a graph (for reading or writing) is done through a module
>> called a "Resolver". Resolvers are libraries that access data in some
>> form, and return it as triples. Read/write resolvers, are also able to
>> write triples back out into the given format.
>>
>> Resolvers register themselves with the system as being able to handle
>> graphs of a certain type, or being able to handle a certain protocol.
>> So now the steps above can be more properly described as:
>
> ...
>
> Ok, read the docs on the wiki about resolvers.  Here's my take, using somewhat different language:
>
> 1.  A "resolver" consists of two functional components; one ("protocol resolver", per http://www.mulgara.org/trac/wiki/ResolverTutorials) resolves (=dereferences?) an address (URL), the other ("content handler") converts the data at the address into triples.

Content handlers aren't really a part of a resolver, but they do work
hand-in-hand, so I guess that perspective works. Just remember that
many (most) resolvers don't need a content handler. Content handlers
are for dealing with file types (like RDF/XML, N3, or even non-RDF
files, such as ID3).

> 2.  The "type" of a graph actually refers to "the registered resolver that handles that type" (per http://mulgara.org/pipermail/mulgara-general/2009-August/000900.html), rather than a type in the generally accepted sense of the term; i.e. from the client perspective graphs are just graphs, and Mulgara's notion of graph type is an implementation detail.  In other words, a difference in graph type does not imply a difference in semantics, as one might expect.  (Not quite true, since e.g. matching is fuzzy for LuceneModel graphs.)

On the right track. All graphs appear as sets of triples. Some are
based on concrete data, while others are generated from other data
(like Lucene). In general, the graphs are queryable with "triple
patterns", though some support semantics to those patterns that go
beyond the standard (again, Lucene does this).

> 3.  One might argue that graph type refers to the format of the originating data source.  But we can't know what the original source format is, since we don't know how the data may have been transformed by other processes, so the most we can say is it refers to the data format that the "content handler" accepts.

I suppose you could see it that way. I really see the "graph type" as
the identifier for the code that will be creating the triples for me.
So if the graph type refers to the XA store, then I will be accessing
binary structures on disk. If the graph type is ID3v1, then I will be
parsing MP3 files for triples. In each case the code that does this is
different.

> 4.  On the other hand, we have types "Model" and "ViewModel", which don't expose the resolver/content handler, unlike "LuceneModel" and "ID3v1" (which I take to mean "MP3 resolver/handler", based on the tutorial), etc.

Content handlers are there to convert files (usually text files, but
not necessarily) into triples. Resolvers will help you find those
files. If you have triples that come from something other than a file
(like a database, or a query to a service, or filesystem metadata)
then you will only use a resolver, and will not use a content handler.

> 5.  The Resolvers wiki page (http://www.mulgara.org/trac/wiki/Resolvers) says that even e.g. RDF/XML and N3 files are dealt with by resolvers and content handlers; does this mean that the graphs of such files have corresponding graph types?  Or would they have the default type "Model"?

Graph types are for those occasions when you explicitly create a graph
with a "create" command. In this case, an entry for the graph will be
placed into the system table, along with the type you requested (or
"Model" if you didn't ask for one). Any queries on these graphs will
find them, along with their associated "type".

If you query for a graph that you never created, then we fall back to
using the URL to located and download the graph you describe. In this
case the graph doesn't have a "type", though the response should have
a mime-type, and we use this to identify the content handler that will
turn the data into triples.

> Long story short, it still isn't clear to me what precisely the notion of graph type is intended to capture, and how it is intended to be used.

Well, the graphs do behave differently, according to type.

For instance, XA graphs let you insert, delete and query for triples
on them. Lucene graphs let you insert and query triples, and can do
queries for triples whose literals match a certain pattern. Prefix
graphs are read-only, and let you do queries on them for URIs or
literals in the main data store that start with a certain string. I
use this primarily for finding members of RDF containers (I look for
URIs that start with http://www.w3.org/1999/02/22-rdf-syntax-ns#_).
Node-Type graphs are read-only and let you determine if a value is a
URI, a blank node, or a literal (so you can say { <some_entity>
<refers_to> $val . $val rdf:type rdfs:Literal }).

So types are very significant, but again, they only apply to graphs you created.

> How does Mulgara decide which type to assign?

You make the choice when you create the graph. If you just create the
graph without a type, then it chooses the type advertised by the
system resolver (this is the XA resolver, unless you changed it).
Otherwise, if you choose a graph type, then Mulgara uses that. See how
to create a graph at:
  http://mulgara.org/trac/wiki/Create

> On the other hand, I can see a use for such of property when it comes to inferencing, where you might want to distinguish between the explicitly asserted graph (type "assertedGraph"?), the inferred graph ("inferredGraph"?), the TBox graph ("terminologyGraph"?), etc.

That's what a number of these types are used for.

The Prefix resolver was specifically created to have an efficient way
to identify container members, and to add the rdfs:member predicate
for them. The NodeType resolver was created to efficiently distinguish
between Literals and URIs during inferencing, since Literals cannot be
inserted into the subject or predicate positions of a triple, even
though a rule may say it should be added like this. For instance,
section 3.6 of the RDFS spec says:
  "The rdfs:range of rdfs:label is rdfs:Literal."
(see: http://www.w3.org/TR/rdf-schema/#ch_label)

So if I see something used as an rdfs:label, I should be able to say
that thing has rdf:type rdfs:Literal, but if I tried to infer that:
  "This is a comment" rdf:type rdfs:Literal .

Other resolvers are used for different kinds of functionality. The
Remote Resolver lets you syndicate your queries between servers. The
View resolver lets you query the results of another query, and so on.

Regards,
Paul