[Mulgara-dev] 100k models?
Paul Gearon
gearon at ieee.org
Sat Apr 7 03:19:04 UTC 2007
On Apr 6, 2007, at 9:37 PM, Andrae Muys wrote:
>
> On 06/04/2007, at 1:55 AM, Laurian Gridinoc wrote:
>
>> On 05/04/07, Paul Gearon <gearon at ieee.org> wrote:
>> This may be problematic, as we have been systematically removing
>> Jena support. We have a number of reasons for this (which I'm
>> happy to go into), but it really comes down to the horrible
>> performance created by Jena.
>>
>> I'm curious :) the performance issues are about Jena+Mulgara or
>> Jena alone?
>
> The problem was the interaction between the scale assumptions of
> Jena, that assumed that the graphs, the query evaluation, and any
> intermediate results, fit in memory. Andrew Newman was the primary
> author of our Jena support - but from what I remember two things
> crippled performance. Jena made assumptions of uniform-cost random-
> access to triples in a graph which caused crippling IO-complexity;
> and it used function composition to build query evaluators, which
> while semantically equivalent to mulgara's coroutine composition,
> doesn't have the same space properties and therefore routinely
> exhausted memory. So ultimately while Jena's implementation has
> good time-complexity, and so works fine for the small models it was
> designed to operate with, it's IO and Space complexities made it a
> poor fit for mulgara. Consequently jena support was deprecated,
> and it will not be supported in the next release.
I've discussed this with Laurian offline, but it bears repeating...
Ultimately, the problem was because Jena's API was never properly
documented. The guys in Bristol come up with a set of interfaces for
people to use (and for RDF database developers to implement), but
never properly told anyone which interfaces they were. Consequently,
people started using classes which were always supposed to be
completely internal to Jena. No, I'm not speculating here - I had a
conversation (in person) with one of the lead developers of Jena
about this (I won't embarrass Brian by mentioning his name).
I'll leave unsaid any comments on packages, visibility, etc, that may
be appropriate here. :-)
With people using these internal classes from Jena, it meant that
anyone (like us) providing a Jena interface was really forced into
lifting these classes out of the Jena code, or reimplementing a LOT
of complex code. This code did things like resolve constraints by
having a lazy filter over the entire database, iterating over each
statement one-at-a-time until it found what it was supposed to
return. This increased our complexity from log(n) to n. Then you
multiply this by the number of constraints. Yeach. Andrae mentioned
another problem, which was loading data into memory. The list of
problems went on and on.
It worked, but it didn't scale at all. It made us look really slow,
and we looked as if we couldn't load very much data.
Paul
More information about the Mulgara-dev
mailing list