[Mulgara-dev] 100k models?

Sat Apr 7 03:19:04 UTC 2007

On Apr 6, 2007, at 9:37 PM, Andrae Muys wrote:

>
> On 06/04/2007, at 1:55 AM, Laurian Gridinoc wrote:
>
>> On 05/04/07, Paul Gearon <gearon at ieee.org> wrote:
>> This may be problematic, as we have been systematically removing  
>> Jena support.  We have a number of reasons for this (which I'm  
>> happy to go into), but it really comes down to the horrible  
>> performance created by Jena.
>>
>> I'm curious :) the performance issues are about Jena+Mulgara or  
>> Jena alone?
>
> The problem was the interaction between the scale assumptions of  
> Jena, that assumed that the graphs, the query evaluation, and any  
> intermediate results, fit in memory.  Andrew Newman was the primary  
> author of our Jena support - but from what I remember two things  
> crippled performance.  Jena made assumptions of uniform-cost random- 
> access to triples in a graph which caused crippling IO-complexity;  
> and it used function composition to build query evaluators, which  
> while semantically equivalent to mulgara's coroutine composition,  
> doesn't have the same space properties and therefore routinely  
> exhausted memory.  So ultimately while Jena's implementation has  
> good time-complexity, and so works fine for the small models it was  
> designed to operate with, it's IO and Space complexities made it a  
> poor fit for mulgara.  Consequently jena support was deprecated,  
> and it will not be supported in the next release.

I've discussed this with Laurian offline, but it bears repeating...

Ultimately, the problem was because Jena's API was never properly  
documented.  The guys in Bristol come up with a set of interfaces for  
people to use (and for RDF database developers to implement), but  
never properly told anyone which interfaces they were.  Consequently,  
people started using classes which were always supposed to be  
completely internal to Jena.  No, I'm not speculating here - I had a  
conversation (in person) with one of the lead developers of Jena  
about this (I won't embarrass Brian by mentioning his name).

I'll leave unsaid any comments on packages, visibility, etc, that may  
be appropriate here.  :-)

With people using these internal classes from Jena, it meant that  
anyone (like us) providing a Jena interface was really forced into  
lifting these classes out of the Jena code, or reimplementing a LOT  
of complex code.  This code did things like resolve constraints by  
having a lazy filter over the entire database, iterating over each  
statement one-at-a-time until it found what it was supposed to  
return.  This increased our complexity from log(n) to n.  Then you  
multiply this by the number of constraints.  Yeach.  Andrae mentioned  
another problem, which was loading data into memory.  The list of  
problems went on and on.

It worked, but it didn't scale at all.  It made us look really slow,  
and we looked as if we couldn't load very much data.

Paul