[Mulgara-general] Optimizing variable-graph queries

Alex Hall alexhall at revelytix.com
Fri Nov 6 18:30:24 UTC 2009

I'm working on an application that stores many records in Mulgara, all
described using the same schema but organized into separate graphs to
track provenance information.  I need to collect records from several
graphs and apply ordering and a limit in order to, for example, find the
50 most recent records across all graphs along with the graphs in which
they appear.

This can be done rather easily with a single SPARQL query of the form:

SELECT ?graph ?item ?timestamp
FROM NAMED <graph1>
  GRAPH ?graph {
    ?item :hasTimestamp ?timestamp .
    // other criteria to identify the records of interest
ORDER BY DESC(?timestamp) LIMIT 50

The inner pattern in the GRAPH group can wind up getting kind of complex
as I bind other record properties and filter for items of interest, etc.

Unfortunately, I'm finding that the performance on this query is not
very good.  In about a quarter of the time it takes to run this query, I
can eliminate the GRAPH expression and just run a series of individual
queries, one per graph, and combine the results in-memory.  This isn't
terribly surprising, considering that XA1.1 isn't optimized for
resolving variable-graph constraints, and naturally it's going to be
faster to sort the results in-memory in the client than do the full
materialization required for Mulgara to order the results for me.

On the other hand, doing everything as individual queries isn't exactly
the ideal solution.  For one, I'm developing with everything running
locally on my laptop but eventually there will be a network sitting
between the client and server.  Also, combining results is something
that the database is supposed to be good at.

So my question is, is there any sort of optimization that can be done,
either in terms of rewriting my SPARQL query or in tweaking the Mulgara
query engine, in order to improve the performance of this query?


More information about the Mulgara-general mailing list