[Mulgara-general] Optimizing variable-graph queries

Paul Gearon gearon at ieee.org
Fri Nov 6 19:54:38 UTC 2009

Hi Alex,

On Fri, Nov 6, 2009 at 1:30 PM, Alex Hall <alexhall at revelytix.com> wrote:
> I'm working on an application that stores many records in Mulgara, all
> described using the same schema but organized into separate graphs to
> track provenance information.  I need to collect records from several
> graphs and apply ordering and a limit in order to, for example, find the
> 50 most recent records across all graphs along with the graphs in which
> they appear.

This should be properly documented, I know, but did you know that if
you're using lots of graphs then you may want to use the XA indexes?
In other words, the config file would use:


Unfortunately, XA11 was the wrong name for me to give to the new
statement indexes. It simply represents an optimization of the common
pattern of only using a couple of graphs. Whereas on the StringPool
and NodePool XA11 is a real upgrade.

All the same, I need to document the continued use of
StatementStoreResolverFactory, since a lot of people need to work with
multiple graphs. But I think <50 should still be OK.

> This can be done rather easily with a single SPARQL query of the form:
> SELECT ?graph ?item ?timestamp
> FROM NAMED <graph1>
> ...
> FROM NAMED <graphN>
>  GRAPH ?graph {
>    ?item :hasTimestamp ?timestamp .
>    // other criteria to identify the records of interest
>  }
> }
> ORDER BY DESC(?timestamp) LIMIT 50
> The inner pattern in the GRAPH group can wind up getting kind of complex
> as I bind other record properties and filter for items of interest, etc.
> Unfortunately, I'm finding that the performance on this query is not
> very good.  In about a quarter of the time it takes to run this query, I
> can eliminate the GRAPH expression and just run a series of individual
> queries, one per graph, and combine the results in-memory.  This isn't
> terribly surprising, considering that XA1.1 isn't optimized for
> resolving variable-graph constraints, and naturally it's going to be
> faster to sort the results in-memory in the client than do the full
> materialization required for Mulgara to order the results for me.

Is there any chance that you can backup your XA11 data, and load it
into a set of new XA indexes? I'm wondering if it's really the indexes
causing you problems, or the query plan.

> On the other hand, doing everything as individual queries isn't exactly
> the ideal solution.  For one, I'm developing with everything running
> locally on my laptop but eventually there will be a network sitting
> between the client and server.  Also, combining results is something
> that the database is supposed to be good at.
> So my question is, is there any sort of optimization that can be done,
> either in terms of rewriting my SPARQL query or in tweaking the Mulgara
> query engine, in order to improve the performance of this query?

I'm curious, doesn't the following query work?

SELECT ?graph ?item ?timestamp
  GRAPH ?graph {
    ?item :hasTimestamp ?timestamp .
    // other criteria to identify the records of interest
ORDER BY DESC(?timestamp) LIMIT 50

(ie. no FROM or FROM NAMED)

I know that I originally coded SPARQL so that this would *not* work,
but Andy assured me that it should, so I thought I went back and did
it correctly. If it doesn't work, then please let me know.

Either way, you're right in that it should be faster. There are unions
in there, which could be part of the reason, in which case the coming
optimizations may fix things. You can check on this for me by hitting
Ctrl-\ on the server console while waiting on a query to respond. If
you see that you're in a "sort" method, then the next release should
help you. Otherwise, let me know where you're spending most of the
time, and I'll look at what's happening with the query plan.

Any feedback you can give me on these questions would really help. TIA.

Paul Gearon

More information about the Mulgara-general mailing list