[Mulgara-general] "Sweet spot" for model size?

Andrae Muys andrae at netymon.com
Mon Feb 25 04:15:12 UTC 2008


On 25/02/2008, at 9:27 AM, Eric Freese wrote:

>> Date: Sat, 23 Feb 2008 18:21:55 -0800
>> From: "Life is hard, and then you die" <ronald at innovation.ch>
>> Subject: Re: [Mulgara-general] "Sweet spot" for model size?
>
>> Looks like you never got a reply - sorry about that.
>
> No problem
>
>> On Thu, Feb 14, 2008 at 10:30:46AM -0500, Eric Freese wrote:
>>>
>>> I'm new to Mulgara and very impressed thus far.  My main question at
>>> this point centers around any tips the community might have as to  
>>> the
>>> best way to organize models.
>>>
>>> I'm trying to load the dbpedia (dbpedia.org) RDF datasets into
>>> mulgara.  The initial loads went pretty fast but they seem to be
>>> getting progressively slower as more and more triples are added.   
>>> What
>>> I'm wondering is if there is a suggested number of triples to have
>>> within a model or a suggested strategy for how models should be
>>> organized.
>>
>> It doesn't really matter how many triples per model or how many  
>> models
>> you have, because mulgara stores everything as four-tuples with the
>> model the being the fourth element (i.e. "(s, p. o) in m" is  
>> stored as
>> "(s, p, o, m)") and because the indexes are fully symmetric and treat
>> all elements of the four-tuple equally.
>>
>> How many triples are you loading and where do you start to see a
>> noticeable slow-down? Are you using "insert" or "load" to load the
>> triples, and are dong this in auto-commit mode or in separate
>> transactions?
>
> Right now I have in the neighborhood of 30 million statements loaded.
> I believe the entire main dataset is around 100 million statements.
> There are additional components that contain 2 billion statements, but
> I don't think I'm going to try to replicate those just yet.  Things
> have gotten progressively slower as more statements are loaded.  I'm
> loading individual files of varying sizes.  Some contain a few 10,000s
> of statements; others have 2 million or more statements.  I'm using
> "load" to add each file into the model.  I'm not doing anything
> special so I'm assuming I'm using the auto-commit.
>
> In reading some of the other messages, I'm guessing that the delay is
> caused by the increased time to update the indexes as they get larger
> and larger.  Does that sound correct?

Yes that sounds correct.  I am interested in what sort of machine you  
are running:  specifically how many hdd's and in what configuration;  
and how much RAM?  Also of interest is what JVM you are using, and if  
you are using the 32 or 64-bit runtime?

> Something else I've started running into is files not loading and
> getting a "javax.transaction.RollbackException: null" message on
> larger files (1 million or more statements).  When I split them into
> smaller files, they load just fine.  Any suggestions?  Should I
> increase the max memory for my JVM?
>
> I'm wondering if I should load each file into its own model and then
> use views to combine them.  Are there any performance issues (similar
> to db joins) in using views?  Are there other pros/cons to this
> strategy?  If I read the docs correctly, a model can participate in
> more than one view, correct?

I must admit there are currently some performance issues with  
performing complex queries against views.  Unfortunately the unions  
aren't as transparent to some of the join optimisations as we would  
like.  On the other hand there is absolutely no reason why they  
couldn't be.  Our optimisations and enhancements are mostly user  
driven, so if mulgara users start reporting this as a problem  
affecting them, it will get fixed.

Andrae

-- 
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd





More information about the Mulgara-general mailing list