[Mulgara-dev] Number dataset loaded on EC2

Chris Wilper cwilper at fedora-commons.org
Thu Nov 20 06:16:38 UTC 2008


Hi Paul,

On Wed, Nov 19, 2008 at 12:28 PM, Paul Gearon <gearon at ieee.org> wrote:
>> It took a little over a day to load about a quarter billion triples.
>
> Really? That's the same speed as my laptop. I don't know what kind of
> bandwidth to disk that you have in the cloud, but I would have thought
> it would be better. :-(

I was hoping for slightly better myself, but I think it's important to
clarify that
this test wasn't aiming for the best possible EC2 performance; just
an easy-to-set-up baseline.  I used instance storage in this particular
test, which is likely not something I'd do in production.  I'd
definitely expect
to get better-than-your-laptop numbers out of a configuration using striped
EBS volumes.

> BTW, I'd forgotten this, but it's also possible to load up these files
> using the .rdf.gz form. This is particularly useful for huge data
> files like this.

Ahh, good to know.

>> It's about 4.6 gigs compressed, 51 gigs uncompressed.
>
> Yes, the URIs contain a lot of redundant information in them, which I
> haven't attempted to remove. XA2 will be storing strings and URIs much
> more efficiently.

>From a user perspective, I'm happy to trade space efficiency for
performance.  But if more efficient storage actually leads to less
paging, all the better.

- Chris



More information about the Mulgara-dev mailing list