[Mulgara-dev] RE: Blank nodes in queries (was: Recent commits to Mulgara.)

Tue Oct 10 19:42:10 UTC 2006

Paul Gearon wrote:
> HI Andy,
> 
> On 10/10/06, Seaborne, Andy <andy.seaborne at hp.com> wrote:
> <snip/>
> 
>> PREFIX  : <http://example/ns#>
>> PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>
>> SELECT  ?X
>> WHERE
>>   { ?X  rdf:type    :Worker ;
>>         :hasFriend  _:Y .
>>     _:Y  rdf:type    :Employee ;
>>         :hasFriend  _:Z .
>>     _:Z  rdf:type    :Manager .
>>   }
> 
> Since I've spent less time with SPARQL than I care to admit (a
> cardinal sin, I know),

:-)

> I have to comment:
> The "blank nodes" of _:Y and _:Z are just existentials here, and do
> not refer to nodes which are really blank at all (as all the matches
> have URIs). 

Yes - correct.  _:Y and _:Z do not refer to blank nodes in the graph (that's 
told bnodes).

> The only difference I see between these and the ?X is
> that the latter appears in the SELECT.  Is this right, or is there
> something more subtle that I'm missing?

They are non-distinguished.  System like Pellet used the non-mention of a 
variable in SELECT to be a non-distinguished variable (existential) for RDQL.

The Pellet demo shows this and has an example of a query where existentials 
get an answer and but use of distinguished variables don't.

http://www.mindswap.org/2003/pellet/demo.shtml
examples 7 and 8

The catch in SPARQL is that it is layered : entailment applies to basic graph 
pattern (BGP) matching but there is an algebra over that.   A variable could 
be used across BGPs  but not in the SELECT projection.  That makes it not 
non-distinguishhed to the BGP match as the value must be bound for the algebra.

SELECT ?X WHERE
{ ?Y :r ?X
   OPTIONAL { ?Y :s :w }
}

The choice Pellet makes is to use bnodes as the extistential syntax and that 
?VAR is a named variable and is distinguished regardless of the SELECT (aids 
composition of queries) and so must have a binding.

For a pure RDF store, and anything where there is a unique deductive closure 
up to bnode isomorphism, then turning bnodes in variables works.  It's even in 
the SPARQL spec as an implementation hint (section 5.2).

> 
> <snip/>
>> The main thing that this simple example shows is that it is impossible
>> to find a unique completion (a unique deductive closure) over which the
>> query can be evaluated. There are *three* incompatible completions,
>> i.e., none of them is minimal (i.e., none of them is included in all
>> the others).
>> A real reasoning by cases mechanism should be employed to answer this
>> query.
> 
> I have two comments on this.
> 
> First is that Mulgara is a plain RDF store.  The Semantics you are
> putting on it here are way above RDF.  From the perspective of RDF,
> the answer is an empty set.
> 
> Second is that we *do* want to support OWL (or other reasoning systems
> if we can make them work in RDF).  That's why I'm writing a response
> here.  :-)

The answer set is empty for Jena as well - we don't cover OWL-DL, just as far 
as you can go with rules (and no negation).  This query would require case 
analysis and that is outside of the rules subset of OWL-DL.  And we are quiet 
RDF centric as well.

> 
> One approach is to try to infer as much as possible from the system
> statically, and to place these results into the data store.  Then any
> queries on the store don't need to do the reasoning themselves, as it
> has already been done.  That gets you some of the way, but it won't
> give you the result you're looking for in this example (since the
> query looks for a composite property that is not described in the
> ontology - a more complex ontology language would allow such a
> property to be defined, and a static inferencer could find it for
> you).
> 
> The other approach is to try to inference on the fly.  However, some
> types of queries are going to be coNP-complete (I have a gut feeling
> that the above query might be one such case, but I'm terrible at
> calculating complexity), and that makes me a little nervous!  Sure, it
> works for small systems, but it doesn't take that much for it to get
> out of hand.  Is this a concern for you?
> 
> You probably know that I've been working on a static inferencer
> (though I've been drawn into other things this year).  This has the
> advantage of working well on large amount of data, but it is poor for
> changing data (ie. new assertions) and for finding composite
> information of the type you describe here.  For that reason I've been
> thinking we will also need a dynamic reasoning engine.  The most
> complete that I can think of is DLV, since it tests all possible
> cases, but I'd like something with better efficiency for systems with
> no disjunctions.  Do you have any thoughts here please?

Caveat: I'm not the rules person over here (Dave Reynolds is our rules expert).

We don't support answering that query at the moment and no one has plans to 
asfar as I know.  We suggest people use Pellet, as a external DL reasoner. 
(We usually point at Pellet because it is open source.)

Currently we have a forward-chaining engine (RETE) which sounds like your 
static inferencer.  And we have a backwards engine - and indeed the forward 
engine, which runs once in the absence of updates, can generate backwards 
rules.  3Store, if I understand it correctly is mainly forward chaining but a 
step of backward chaining at run time to trade off time/space for RDFS and the 
  fact there is an unbounded number of RDFS rules for the rdf:_1, _2 properties.

This does not get the query result above - that would require more than rules 
(forward or backward) such as determining all the possible cases and seeing 
that the pattern matches in all worlds.  In other words, treating existential 
variables differently to universal variables.

Axel Polleres recently pointed to:
http://con.fusion.at/dlvhex/sparql-query-evaluation.php

	Andy

> 
> Regards,
> Paul Gearon