[Mulgara-general] "Sweet spot" for model size?

Eric Freese freese.eric at gmail.com
Sun Feb 24 23:27:16 UTC 2008


> Date: Sat, 23 Feb 2008 18:21:55 -0800
> From: "Life is hard, and then you die" <ronald at innovation.ch>
> Subject: Re: [Mulgara-general] "Sweet spot" for model size?

> Looks like you never got a reply - sorry about that.

No problem

> On Thu, Feb 14, 2008 at 10:30:46AM -0500, Eric Freese wrote:
> >
> > I'm new to Mulgara and very impressed thus far.  My main question at
> > this point centers around any tips the community might have as to the
> > best way to organize models.
> >
> > I'm trying to load the dbpedia (dbpedia.org) RDF datasets into
> > mulgara.  The initial loads went pretty fast but they seem to be
> > getting progressively slower as more and more triples are added.  What
> > I'm wondering is if there is a suggested number of triples to have
> > within a model or a suggested strategy for how models should be
> > organized.
>
> It doesn't really matter how many triples per model or how many models
> you have, because mulgara stores everything as four-tuples with the
> model the being the fourth element (i.e. "(s, p. o) in m" is stored as
> "(s, p, o, m)") and because the indexes are fully symmetric and treat
> all elements of the four-tuple equally.
>
> How many triples are you loading and where do you start to see a
> noticeable slow-down? Are you using "insert" or "load" to load the
> triples, and are dong this in auto-commit mode or in separate
> transactions?

Right now I have in the neighborhood of 30 million statements loaded.
I believe the entire main dataset is around 100 million statements.
There are additional components that contain 2 billion statements, but
I don't think I'm going to try to replicate those just yet.  Things
have gotten progressively slower as more statements are loaded.  I'm
loading individual files of varying sizes.  Some contain a few 10,000s
of statements; others have 2 million or more statements.  I'm using
"load" to add each file into the model.  I'm not doing anything
special so I'm assuming I'm using the auto-commit.

In reading some of the other messages, I'm guessing that the delay is
caused by the increased time to update the indexes as they get larger
and larger.  Does that sound correct?

Something else I've started running into is files not loading and
getting a "javax.transaction.RollbackException: null" message on
larger files (1 million or more statements).  When I split them into
smaller files, they load just fine.  Any suggestions?  Should I
increase the max memory for my JVM?

I'm wondering if I should load each file into its own model and then
use views to combine them.  Are there any performance issues (similar
to db joins) in using views?  Are there other pros/cons to this
strategy?  If I read the docs correctly, a model can participate in
more than one view, correct?

> Generally, loading as much as you can in a single transaction will be
> fastest because there's some noticeable overhead on the commit.

Thanks!
Eric



More information about the Mulgara-general mailing list