[Mulgara-general] Spanning multiple disks

Paul Gearon gearon at ieee.org
Mon Jun 30 15:44:38 UTC 2008


On Mon, Jun 30, 2008 at 10:28 AM, Chuck Borromeo <cborromeo3 at yahoo.com> wrote:
> As far as I know this is not a problem right now.  I am still evaluating Mulgara so I have not really hit any performance issues yet.  (I was using Virtuoso to store triple data and now I am looking at Mulgara.)  Could it be possible to add the ability to span multiple disks into the list of features for the new file system?  I have read some of the documents concerning the XA2 file system.  If XA2 is meant to scale to really large data sets (on the order of TB), I think I/O will quickly become an issue.

Yup, though the real issue comes with doing lots of joins. In that
case it would be much better if you could duplicate the data across
multiple disks and perform the joins between these. It would increase
the storage requirements, but then again, the total would still be
less than what Mulgara uses now, and it would give a big speed
improvement.

For the time being, we want to get the first round of improvements
implemented. This sort of plan CAN be done later on, so we should
really address that once we see it as our biggest issue.

> I have a second question (and this is probably a simple one), how do I start Mulgara to use more java virtual memory?  I try running Mulgara from the command line with these parameters:
>
> java -Xmx1024m -jar mulgara-2.0-alpha.jar -p 4567
>
> I get java.IO.FileException: Cannot Allocate memory error at MappedIntFile.mapFile.  I have been trying to load a fairly large set of data into Mulgara and I get java heap memory errors.  I was hoping to extend the heap memory of Mulgara to avoid the errors while loading.   Any suggestions?

The issue here is that you are using the 32 bit JVM, and the system is
hitting the 2GB limit with memory mapping.

There are 2 solutions, depending on your system.

If you have a 32 bit system, then you'll need to add the following to
your command line:
  java -Xmx1024m -Dmulgara.xa.forceIOType=explicit -jar
mulgara-2.0-alpha.jar -p 4567

This will tell the system to NOT use memory mapping for the index
files, meaning that that you can have greater than 2GB in these files.

If you have a 64 bit system, then just turn that option on:
  java -d64 -Xmx1024m -jar mulgara-2.0-alpha.jar -p 4567

You can memory map files totaling up to about 300GB then (the exact
number is OS dependent). Don't worry about that limit though, as this
only applies to index files. The data files can grow much larger
still.

Regards,
Paul



More information about the Mulgara-general mailing list