[Mulgara-general] Query performance when using a view

Alex Hall alexhall at revelytix.com
Thu Apr 15 16:43:39 UTC 2010


On 4/15/2010 10:16 AM, Paul Gearon wrote:
> Hi Steve,
>
> On Thu, Apr 15, 2010 at 6:07 AM, Steve Bayliss
> <stephen.bayliss at acuityunlimited.net> wrote:
>   
>> I have a number of named graphs (models) which I want to query over as a
>> whole, so to do that I've defined a (union) view model, called #view
>>
>> I'm noticing a significant difference in performance when querying this view
>> compared with querying a single model containing the triples from each of my
>> individual models.  (For comparing performance, I've selected all triples
>> from my view and inserted them into a single, new, model - #full)
>>
>> For instance:
>>
>> select $s $p $o from <[model]> where $s $p $o limit 20
>>
>> With [model] as #view query times are roughly 2000x that of when [model] is
>> set to #full.  Similarly if I do a count of all triples (the count returned
>> is the same in each case).
>>
>> With a more constrained query, eg
>>
>> select $s $p $o $t $u $v $w from <[model]> where $s <mulgara:is>
>> <[some-uri]> and $s $p $o and $o $t $u and $u $v $w
>>
>> with [model] as #view query times are roughly 4x that of when [model] is
>> #full
>>     

Every time a query is executed against a view, the view resolver has to
build up the view definition from the definition graph.  For the simple
queries, it could very well be the case that this overhead is dominating
the query execution time.  For a more-constrained query, it would take
proportionally less time to build the view definition hence the smaller
difference in overall execution time.

Unfortunately, I don't think we have the framework in place to be able
to cache those view definitions over the lifetime of the server and
still know when to throw out the cached definition in response to a
change in the view definition graph.  It might be possible for the view
resolver factory to tie into the transactional framework to at least
cache the definition during a transaction, but I don't know how much
work that would be.

> No, it's not to be expected. I'm familiar with views, and have some
> idea of how they are implemented, but I also haven't spent any time
> working with them, so I don't really know where the bottleneck might
> be.
>
> Other options include selecting from a union of all the graphs, or
> selecting from a variable graph and then constraining the required
> graphs. The latter is a little less convenient in TQL, though still
> possible. (it looks like you're just using TQL, right? Or are you
> sometimes using SPARQL?)
>
> To select from multiple graphs in TQL, use:
>
> select $s $p $o
> from <#m1> or <#m2> or <#m3> or <#m4> or <#m5>
> where $s $p $o limit 20
>
> In SPARQL it's:
>
> select $s $p $o
> from <#m1>
> from <#m2>
> from <#m3>
> from <#m4>
> from <#m5>
> where { $s $p $o } limit 20
>   

If the list of graphs in the union is always the same (and you're
calling this from Java code) then you could also use the
SparqlInterpreter.setDefaultGraphs(...) method and omit the "from"
clauses in the query.

Any difference you see in execution times between querying the union and
querying the view can probably be attributed to the overhead of fetching
the view definition.

Regards,
Alex




More information about the Mulgara-general mailing list