[Mulgara-general] Models, inferencing, etc.

Alex Hall alexhall at revelytix.com
Wed Aug 26 14:57:19 UTC 2009


Gregg Reynolds wrote:
> One thing that is confusing for newcomers to the world of RDF and OWL
> etc. is the lack of a standard terminology for, uh, well I'll call it
> reasoning services.  We have at least the following terms:
> 
>  Reasoning services
>  Inferencing services
>  Entailment
>  Descriptive logic
>   and etc.
> 
> To go with this, different implementors refer to RDF "stuff" in
> different terms.  SPARQL refers to "RDF Datasets" and "graphs"; Jena
> documentation refers to graphs but uses "model" to mean graph (in
> addition to various other things); Mulgara uses both "graph" and
> "model", but not "dataset", so far as I know.

Here's my take: "graph" refers to a collection of RDF triples; it is an
abstract data structure which describes relationships between resources
but has little inherent meaning.  The term "model" refers to the meaning
we assign to an RDF graph by relating resources in the graph to
higher-order concepts.  In this sense, a graph can be viewed as the
serialization of some model.  A dataset in SPARQL is just a collection
of one or more graphs that defines the target(s) for pattern matching.

> This proliferation of quasi-technical terminology is a big problem for
> noobs.  What's the difference between a reasoning service, for example,
> and a rule engine?  How does RDFS validation relate to OWL? Where does
> Mulgara fit?  Etc.

In general, "inference" and "reasoning" can be used interchangeably.
Both refer to the process of finding new relationships between resources
in a graph based on the properties of the resources in the graph.  The
new relationships are inferred based on the semantics assigned to
resources in your graph by describing them in terms of concepts from
well-defined languages.  A language is defined in terms of a vocabulary
(class and property identifiers) and a formal description of the meaning
of concepts from the vocabulary.

The two most common languages for inference in RDF are RDFS and OWL,
although SKOS is becoming popular as well.  RDFS gives you basic
capability for inferring class membership based on subclass
relationships and property domains and ranges.  OWL is an extension of
RDFS which gives you a lot more expressiveness but at the expense of
computational complexity.  In other words, you can infer many more (and
more interesting) relationships in an OWL model than in RDFS, but you
sacrifice performance and scalability.

A reasoning service is, as you would expect, a service that provides
reasoning capabilities.  It is a piece of software which, given a set of
base RDF assertions, will allow you to find inferred statements.  There
are two basic types of reasoning services: rule engines and, for lack of
a better term, description logic reasoners.

A rule engine operates by translating concepts in a language into simple
if-then rules.  For example: "if X is a member of class A, and class A
is a subclass of class B, then infer that X is a member of class B."
Evaluation of the logic in a model is reduced to basic graph pattern
matching operations.  This is, in general, very fast and scales very
well because the rule engine is not memory-bound.  Mulgara fits in by
providing an implementation of a rules engine.

Description logic, on the other hand, is a branch of mathematics which
deals with classifying individuals in terms of class descriptions.
These class descriptions may be quite complex boolean combinations of
other classes or property restrictions; a full accounting of description
logic is beyond the scope of this discussion.  The important part is
that description logics are much more expressive than simple rules, and
are the theoretical foundation of OWL (although some features of OWL can
be implemented with rules).  Pellet is far and away the most popular
implementation of a description logic reasoner in the semantic web
community.  Most application developers treat it as a black box where
you feed it an OWL ontology, some magic happens, and you get out a
collection of inferred facts supported by the base statements.  Pellet
is memory-bound, however, so there is a limit to the size of ontologies
that it is able to handle.

One last note, since we're talking about terminology.  Be very careful
with your use of the term "validation".  RDFS, while it calls itself a
schema, does not do schema validation in the sense that XML does schema
validation.  RDF operates under the open-world assumption, meaning that
a description can be incomplete without being invalid.  Applying an RDFS
schema will attempt to fill in some of these missing parts based on what
it can infer from your model, not flag your model as invalid.  It is
possible to wind up with an inconsistent model -- for instance, by
inferring an individual to be a member of two classes declared to be
disjoint -- and the term "validation" refers to finding these
inconsistencies.  However, it is much harder to wind up with an
inconsistent model than you might expect, and next to impossible in the
case of RDFS.  With RDFS, you're much more likely to wind up with
unexpected (and possibly confusing) inferred statements than you are to
find inconsistencies.

> Now Mulgara's documentation regularly uses the term "graph" in the way I
> would expect; but then it uses the term "model" in various ways, such
> that it's hard to tell the difference.  For example, TQL "create" takes
> a "type" parameter, which is taken to refer to various "types" of
> models, not graphs.  Quoting from http://www.mulgara.org/trac/wiki/Create:
> 
>    "http://mulgara.org/mulgara#Model  This is the default triple store
> graph." 
> 
> So, which is it , a graph or a model?  My inference is that Mulgara uses
> "graph" and "model" as synonyms; true? 

As Paul mentions, in the Mulgara documentation "graph" and "model" are
used as synonyms; everything used to be called models but now graph is
the preferred term.  In the above case, the URI
<http://mulgara.org/mulgara#Model> is persisted in the database to
identify a default triple store graph; we couldn't change that URI
without breaking backwards compatibility.  We make every effort to use
"graph" in any new documentation, but old habits die hard.

> As a general principle, I would propose that any RDF project adhere
> strictly to the terms "dataset" and "graph" as used in the SPARQL spec,
> and be very careful about "model".  At least in the documentation;
> obviously you can't change APIs willy-nilly.
> 
> Now I did dig around in the sources and found the documentation on
> inferencing, which seems to be a good start.  In particular,
> /docs/site-src/inferencing/infermulgara.html clears things up a little
> bit.  It sounds something like what is described in the Jena
> documentation for the Ontology and Inferencing APIs, which, uh, makes sense.
> 
> So I guess my question is this: is there a reason (other than lack of
> time) that isn't on the wiki, or more polished?  IOW, I don't want to
> run with it (improve and wikify it) if it's all in a state of flux.
> 
> Maybe a first task for the developers would be to go through the
> existing documentation and indicate what is obsolete and what isn't.
> 
> A sugggestion:  put the technical documentation into Docbook.  I for one
> like to be able to print out a print-formatted copy of documentation;
> with Docbook you get both HTML and PDF output for free.  You can find a
> superb (quasi-free) docbook editor at http://www.xmlmind.com/xmleditor/
> (no relation to them, I just like the editor).

I'd say lack of time is the main factor here.  I'm not trying to brush
you off -- we certainly appreciate your interest in the project and your
willingness to help out with the documentation.  In my case, Mulgara is
not the primary project that I work on -- I'm employed by a company that
develops our own commercial software using Mulgara as a backend.

Regarding the use of Docbook, my own instinct is that documentation
stands a much better chance of remaining up-to-date if it can be edited
in-place on the wiki than if it is maintained in some internal
repository from which it is published on a regular basis.  But if others
wish to use Docbook, I certainly won't stand in your way.

Regards,
Alex



More information about the Mulgara-general mailing list