[Mulgara-dev] Mulgara SAIL and SPARQL tests

Thu May 8 00:29:30 UTC 2008

Hi James,

On Wed, May 7, 2008 at 3:54 PM, James Leigh <james-nospam at leighnet.ca> wrote:
>  Once you provide a fix for the triple-pattern-003 the Mulgara-SAIL will
>  pass nearly all our SPARQL tests[*].

Fixed. It ended up being a REALLY tough one to fix, but it seems to
work perfectly now. It certainly passes the triple-pattern-003 test.

I'll try to extract patches for you in the morning (it's Anne and my
anniversary tonight, so I get the night off).

> The only exception being the "No
>  Distinct" and "REDUCE" tests. For these it is as if Mulgara adds in a
>  DISTINCT to every query and does not return any duplicates. This causes
>  the tests to fail, since they are testing for duplicates.

Heh. I know all about this. I've been talking to a few people about it too.

Andy Seaborne pointed out that the DAWG discussed going with an
"auto-distinct" specification, but a few implementors get their way
and made it an option. This was not popular among the theorists, as it
violates set semantics.

This is relevant to Mulgara, as we aimed for set semantics early on in
the piece. For this reason, duplicates are removed at every step of
the way. Another good reason for this is that it reduces the size of
resolutions of BGPs, making joins much more efficient. Even when we
take a hit by having to remove duplicates on more complex expression,
we still save a lot in bandwidth because we are moving smaller items
around (recalling that we can resolve queries from multiple sources
around a network).

Andrae has proposed (I don't know how serious he is) that we NEVER
support DISTINCT, and opt for 99% compliance with SPARQL. I see his
point, but I don't think that's good enough. However, we will have to
modify all of our external API and SPI definitions, which is a huge
task. So for the time being we will have to say we don't do it.

>  I believe it is okay to treat REDUCE the same as DISTINCT - therefore
>  this is would not be a problem. However, for queries without a
>  DISTINCT[**] modifier it looks like a problem.

Yes. Whether you implement REDUCE as DISTINCT, or not at all, you're
in compliance. It's a compromise option for those situations where a
pure DISTINCT is too expensive, but there are still a few duplicates
that are easy to find and remove. By not doing anything we are in
compliance.  :-)

>  Is there any way to turn the DISTINCT modifier off and have mulgara
>  return duplicates (one for each statement)?

I think I've explained the issues about this above. It can't be done
in the near term.

>  [*] This is only using Mulgara for storage and inner joins. Had to turn
>  the mulgara-sparql specific code off for sparql compatibility (for
>  now).

I understand. Still, optional joins are working now, as is that
repeated variable pattern. I'll be doing some things with default
graphs tomorrow, and by the end of the week I hope to have language
codes and dateTime timezones.

Regards,
Paul Gearon