[Mulgara-general] Memory error during rdf load
Paul Gearon
gearon at ieee.org
Thu Aug 14 21:32:29 UTC 2008
On Thu, Aug 14, 2008 at 1:35 PM, Bill OConnor <wtoconnor at gmail.com> wrote:
<snip/>
> Maybe I'm showing my ignorance but as far as 64 bit is concerned it isn't
> clear to me how that would help.
> It would expand the addressable memory space and increase the data bandwidth
> but I still only have swap
> and real memory available. These will not change and are addressable by 32
> bits. Longs in Java are 64 bit
> regardless of JVM.
>
> How does 64 bit help?
The address space in a 64 bit JVM is 16 exabytes, which is effectively
unlimited. For compatibility between JVMs, no single entity can occupy
more than 2GB, but that doesn't mean you can't have lots of things set
to that size. On a 32 bit JVM the sum total of all your objects cannot
exceed 2GB.
What this means for us is that we can memory map an entire file into
an array of 2GB mappings. This means that the OS can schedule reading
and writing of data to/from disk, which frees up a lot of data
structures, avoids a lot of Java-binary transitions, and lets us skip
a lot of code. The end result is a system that uses less a lot less
heap, and runs faster.
> It seems more likely that there is a data structure being built in Mulgara
> that is proportional to the size
> of the file being loaded. (?)
Yes, though this isn't really a Mulgara issue. The thing about RDF is
that anything can link to anything else. That means that it's possible
for an element at the end of an RDF file to refer back to something at
the beginning. That means that the parser has to keep some state about
what it's already seen. We try to make that happen on disk, but there
are a few things that build up in memory anyway.
N3 is usually a little cheaper, but if there are a lot of explicit
blank node references (such as _:1234) then it has to build these up
too.
Regards,
Paul Gearon
More information about the Mulgara-general
mailing list