[Mulgara-dev] Unbound variables in SPARQL joins

Alex Hall alexhall at revelytix.com
Mon Mar 22 18:32:37 UTC 2010


Paul et al,

I believe I've identified an area where Mulgara does not conform with
the SPARQL query specification -- specifically, the handling of joins
where one of the common variables is unbound for a given row.  The
particular query that brought this up involved an optional join with
multiple common variables, and a union within the optional portion.  For
example:

SELECT ?x ?y ?value
WHERE {
   ?x test:related ?y .
   OPTIONAL {
      { ?x test:p1 ?value } UNION { ?y test:p2 ?value }
   }
}

The OPTIONAL portion produces rows where ?x is bound and ?y is unbound,
and vice versa.  According to SPARQL, these should be treated as matches
in the join (and a test with Jena bears this out), but Mulgara is not
matching these rows, and therefore is not binding anything to ?value in
the query solution.

SPARQL joins are defined in terms of solution compatibility.  The
section of the SPARQL definition dealing with graph patterns
(http://www.w3.org/TR/rdf-sparql-query/#sparqlQuery) defines compatible
solution mappings as any two solutions where every variable that is
bound in both solutions is bound to the same value.  I take that to mean
that an unbound variable in one of the solution mappings is to be
disregarded for the purposes of the compatibility test.

In terms of the Mulgara tuples algebra, a solution mapping corresponds
to a row in a Tuples.  For the purposes of joins, Mulgara uses a more
strict definition of compatibility for its matching criteria: solutions
are considered compatible only if all common variables in the
corresponding Tuples rows are bound in the exact same way (i.e. both
unbound or both bound to the same value).

Unfortunately, it looks like a correct implementation of SPARQL joins is
likely to incur a significant performance hit.  Forcing all common
variables to match exactly allows Mulgara to re-order the columns in the
Tuples operands of a join in order to take advantage of prefixing.  But
if one or more of those prefixed columns can be ignored for a given row
because of an unbound variable then we can no longer count on the
prefixes to be valid.

Any comments/suggestions?

Regards,
Alex




More information about the Mulgara-dev mailing list