[Mulgara-dev] RE: Blank nodes in queries (was: Recent commits to Mulgara.)

Tue Oct 10 09:03:20 UTC 2006

Andrae,

I'm clearer now about the handling of blank nodes in Mulgara - they are
"told blank nodes" in the DAWG sense.  There have been a few comments
about these and several systems utilize the fact that in the syntax,
<_:xyz> is syntactically legal and they then produce a told blank node.

SPARQL basic graph pattern mathcing is based on entailment (with some
extra machinary to remove infinite results and results without
additional information content). At its heart is "for what variable
bindings, does G entail the substituted pattern"

G |= S(BGP) 

If G is 
:x :p :y

then and the pattern is

?x :p  _:a .

then G entails :x :p _:a so ?x=:x is a solution.

Theer is also the implementation hint [1] which describes how, for
simple entailment, a non-logical engine might implement this.  The
formal definition has a bug in it at the moment (it can generate junk
answers) - there is a high degree of consensus that the implementation
hint had better be right.

[1] http://www.w3.org/TR/rdf-sparql-query/#BGPsparql

For the co-reference, I couldn't find a good example (there is one - I
just can't find it). One example we have been using for the difference
between pure matching and entailment for OWL/DL is below.  It does not
illustrate co-reference, where the system has not determined whether a
pair of variables might be the same term or not - but it has determined
that both have a (non-distinguished) match.

(example from Enrico Franconi)
-------------------------------------

Ontology:
the class WORKER is declared equivalent to the union of the classes 
EMPLOYEE and MANAGER:
WORKER = EMPLOYEE \or MANAGER

Database:
Paul rdf:type WORKER
Andrea rdf:type WORKER
Simon rdf:type EMPLOYEE
Caroline rdf:type MANAGER
Paul ns:has-friend Andrea
Paul ns:has-friend Simon
Simon ns:has-friend Andrea
Andrea ns:has-friend Caroline

@prefix :        <#> .
@prefix ns:      <http://example/ns#> .
@prefix a:       <http://example/aBox#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

a:Simon rdf:type      ns:EMPLOYEE ;
    ns:has-friend     a:Andrea .

a:Caroline rdf:type   ns:MANAGER .

a:Paul rdf:type       ns:WORKER ;
    ns:has-friend     a:Simon ;
    ns:has-friend     a:Andrea .

a:Andrea rdf:type     ns:WORKER ;
    ns:has-friend     a:Caroline .

PREFIX  : <http://example/ns#>
PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT  ?X
WHERE
  { ?X  rdf:type    :Worker ;
        :hasFriend  _:Y .
    _:Y  rdf:type    :Employee ;
        :hasFriend  _:Z .
    _:Z  rdf:type    :Manager .
  }

The query:
Tell me the workers having a friend which is an employee, which in turn 
should have a friend which is a manager.
[I don't know the syntax you prefer; in prolog-like notation this would 
be:
  q(X) :- worker(X), has-friend(X,Y), employee(Y), has-friend(Y,Z), 
manager(Z).]

The answer is the set {Paul}.

To understand why Paul satisfies the query, you can first have a visual 
representation of the database (the "little house") - all the edges are 
of type "has-friend":

          Paul:WORKER
          /          \
         /             \
        V               V
Andrea:WORKER <- - Simon:EMPLOYEE
        |
        |
        V
Caroline:MANAGER

If you look for a path WORKER-EMPLOYEE-MANAGER, you don't find one. 
However, you can say that in any possible world Andrea is either a 
manager, or an employee, or both (since he/she is a worker, given the 
ontology). Therefore, in each possible world you can find that the node 
with Paul can be the root of a path WORKER-EMPLOYEE-MANAGER. So, Paul 
is in the answer set of the query.

The main thing that this simple example shows is that it is impossible 
to find a unique completion (a unique deductive closure) over which the 
query can be evaluated. There are *three* incompatible completions, 
i.e., none of them is minimal (i.e., none of them is included in all 
the others).
A real reasoning by cases mechanism should be employed to answer this 
query.

-------- Original Message --------
> From: Andrae Muys <mailto:andrae at netymon.com>
> Date: 9 October 2006 13:35
> 
> On 09/10/2006, at 8:56 PM, Seaborne, Andy wrote:
> > > From: Andrae Muys <mailto:andrae at netymon.com> On 07/10/2006, at
> > > 3:27 AM, Seaborne, Andy wrote: 
> > > 
> > <snip/>
> > > 
> > > Considering the query you included below:
> > > 
> > > SELECT ?x WHERE { ?x :p ?y }
> > > 
> > > The query pattern is:                ?x :p ?y
> > > The distinguished variables are:     ?x
> > > The non-distinguished variables are: ?y
> > 
> > SPARQL also makes it a little bit more difficult - there's three
> > species of variable.  SPARQL is an algebra over basic graph pattern
> > matching so a variable can be non-distinguished and used in only one
> > basic graph pattern or non-distinguished but used in two or more
basic
> > graph patterns, or a filter.  In the latter case, it will need a
> > value, in the former, it does not need a binding.
> 
> I don't see how the one-graph pattern case differs from the multi-
> graph pattern case.  However I do recognise your difficulty with
> filter.  The variables in a graph pattern are existential placeholders
> subject to unification with the targeted rdf graph - the   
> unification generating a set of bindings that constitute the result.
> FILTER is defined as a predicate *function*, consequently it's
> variables are not existentials, but rather formal parameters.  I feel
> the decision to introduce this distinction was both unnecessary and
> unfortunate, however it has been made, and we therefore must recognise
> the distinction when dealing with SPARQL query semantics.    
> 
> At this stage Mulgara does not support FILTER in this manner.  The
> subset we have implemented to date is also expressed in terms of
> unification, and consequently in all my email I use the term variables
> as existentials.  If I wish to discuss 'filter variables', I will use
> the terms 'parameters', or 'arguments'.    
> 
> > > As far as I am concerned a blank node in a query is to be treated
no
> > > differently from any other node-type.  Given blank-node semantics,
> > > they naturally carry identify only with respect to their
containing
> > > graph. Therefore any blank-node that is going to be included
within
> > > a query must have been obtained from that graph by a prior query.
> > > Any blank-node that was not obtained from the graph being queried,
> > > by definition cannot match a blank-node, and therefore any pattern
> > > containing such a bnode will fail.
> > 
> > A blank node is existentially quanitified just outside the graph -
> > 
> > http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/
> > 0153.html and "What counts as an RDF graph":
> > http://www.ihmc.us/users/phayes/RDFGraphSyntax.html
> > 
> > My underatnding is that it has identity but it's not uniquely
> > quanitified over the domain of discourse.
> 
> Thank-you for the links.  They have helped me clarify my understanding
> of blank-nodes.  Still while I see I should have expressed myself
> differently, the conclusion remains valid.  An anonymous blank-node in
> a graph-pattern is a unique resource that cannot unify with any other
> blank-node in the target graph (or in the query for that matter).  To
> include a blank-node from a graph in a query, this must have been
> assigned a 'name' (a bnodeid), however temporarily.  That name will
> have a scope, in the case of Mulgara that scope will be the current
> Transaction as we have no concept of a stable bnodeid across
> transactions.  However other databases may well choose to define
> alternative 'scopes' for their names.          
> 
> I understand the terminology the DAWG is using for this concept is
> 'told bnodes'.  Certainly that is the term used in the two links you
> provided.  'told bnodes' are the "single exception noted above" I
> mentioned.  Blank nodes within the target graph, having been provided
> with names whose scope includes the query in question, being
referenced
> explicitly from within the query by use of those names.     
> 
> > > > If a blank node is treated as purely and only existential, then
it
> > > > might be that block nodes need different treatment to named
> > > > variables. On the other hand, and certainly for access RDF graph
> > > > with simple entailment, it might be that systems are not
> > > > forbidden in treating blank nodes in queries as implicit
> > > > anonymous variables with the same semantics as other variables.
> 
> However this does not appear to be describing 'told bnodes'.  Rather
> it appears to be conflating blank-nodes with anonymous variables.
> The problem with this is that blank-nodes are already defined to have
a
> global scope, and to be distinct.  It appears you are wanting
something
> along the following lines:  
> 
> Given
> 
> _:1 :p :a
> _:2 :p :b
> 
> and the query pattern:
> 
> { _:5 :p ?x }
> 
> I'm interpreting the above paragraph to intend the result to be:
> 
> { { ?x -> :a }, { ?x -> :b } }
> 
> which would imply that _:1 === _:5 === _:2 or in other words _:1 ===
> _:2 - which is clearly invalid.
> 
> On the other hand I could see the following query pattern:
> 
> { ?_ :p ?x }
> 
> returning the above result.  If what you want is anonymous variables
> then you might as well use the same syntactic conventions as every
> other language that supports pattern-matching and/or unification, and
> simply use underscore.  I have every intent of providing $_ for
Mulgara
> at some stage for precisely this purpose (and since we already support
> [ ... ] patterns, it's probably only about a days work).     
> 
> > Ah - I see - the blank node need not be identified as a blank node
> > but, as an existenial, may be matched to a term in the graph or
indeed
> > (OWL/DL) with the existence of something with no materialised term.
> > 
> > An OWL/DL system could create a term for that thing - it just that
> > they don't.  It can be tricky in the case of co-reference but as the
> > existentials are not distinguihed that information is lost anyway.
> 
> I definately don't understand you here.  My understanding of a blank
> node is as an existential, so how should it be identified '[not] as a
> blank node but as an existential'?  I'd also like to check that by
> "indeed (OWL/DL) ... no materialised term", you are refering to
virtual
> and/or inferred graphs?  Certainly Mulgara makes extensive use of what
> I believe you refer to as 'non-materialised terms', I refer here to
the
> XSD resolver in particular, as well as resolvers in general.  I really
> must get around to finishing the arithmetic resolver, and I think I
> know how to implement a regular-expression resolver as well.        
> 
> Could you elaborate on the co-reference case?  I'm curious what this
is
> referring to. 
> 
> Andrae
> 
> --
> Andrae Muys
> andrae at netymon.com
> Principal Mulgara Consultant
> Netymon Pty Ltd