[Mulgara-dev] Unbound variables in SPARQL joins

Paul Gearon gearon at ieee.org
Mon Mar 22 20:42:16 UTC 2010


Hi Alex,

On Mon, Mar 22, 2010 at 2:32 PM, Alex Hall <alexhall at revelytix.com> wrote:
> Paul et al,
>
> I believe I've identified an area where Mulgara does not conform with
> the SPARQL query specification -- specifically, the handling of joins
> where one of the common variables is unbound for a given row.  The
> particular query that brought this up involved an optional join with
> multiple common variables, and a union within the optional portion.  For
> example:

Yes, I'm aware of this problem. I was working on it, when I was asked
to concentrate on some performance problems instead. In fact, the
problem you are describing is tested for in the SPARQL algebra tests,
which was how I discovered it.

<snip/>

> Unfortunately, it looks like a correct implementation of SPARQL joins is
> likely to incur a significant performance hit.  Forcing all common
> variables to match exactly allows Mulgara to re-order the columns in the
> Tuples operands of a join in order to take advantage of prefixing.  But
> if one or more of those prefixed columns can be ignored for a given row
> because of an unbound variable then we can no longer count on the
> prefixes to be valid.
>
> Any comments/suggestions?

I agree about the performance concern, but I thought it might be
possible to identify queries that create this situation and treat them
differently (so that only those queries have a performance hit).
However, it wasn't as simple as I thought.

You'll note that the current join code was written by Andrae, and it's
quite hard to follow. I asked his advice about it, and he tells me
that Mulgara's query algebra is supposed to be the same as SPARQL on
this point. He claimed that there's even a unit test for it (though I
didn't have time to identify it before moving on to other things). If
it's around, then it should be present in the unit tests for
TuplesOperations. I don't know if the test is not working correctly,
or if it's working fine, and the bug is somewhere else in the
Disjunction stack.

So it looks like this part of the query algebra is buggy in Mulgara
right now. I certainly want to fix it, but I will need to finish up a
couple of things before I do. The fact that it's supposed to work the
SPARQL way is encouraging. Ideally, given the complexity of some of
the code, then Andrae would look at it too, but I think he's too busy
at the moment.

Paul



More information about the Mulgara-dev mailing list