[Mulgara-dev] CRITICAL: Bug fix to Backup operation (off-topic)

Thu Mar 27 22:21:42 UTC 2008

On Thu, Mar 27, 2008 at 3:44 PM, Alex Hall <alexhall at revelytix.com> wrote:

> As far as I can recall, the main difficulty in maintaining a reference
> to a blank node across transactions lies in the fact that the underlying
> gNode may have been released and reallocated in the meantime.
> Therefore, even though I believe the Connection API allows for a blank
> node to be used in a query constraint (and possibly other places), doing
> so is not guaranteed to give the results you may have expected.
> Certainly the TQL query grammar does not support a blank node in a query
> constraint.
>
> The news that gNodes won't be reused in the new string pool
> implementation seems to suggest that the identity of a blank node
> reference will remain intact across the lifetime of the database.  If
> this is indeed the case, then would we be able to explicitly support the
> use of blank nodes in queries?  That would be a most welcome enhancement.
>

Yes. It will probably give the RDF folks apoplexy, but this will be fine.

For those who are interested, this has a bit of history for us....

Once upon a time Kowari used 32 bits for all of its internal identifiers.
Even if you only have a few million triples then you can quickly use up all
this space if you load and drop data frequently. So we worked very hard to
reuse identifiers in a transaction-safe way. This resulted in the FreeList
class which is a performance hog for us, as well as chewing up a lot of disk
space.

While millions of triples were fine in our first few years, it eventually
became apparent that we needed to go beyond the 2 billion identifiers (and
file offsets) available in a 32 bit design. So we moved to a 64 bit
structure everywhere. This worked well, and gave us a big performance
improvement on 64 bit architectures as well (since memory mapping can now
cover all of the files, regardless of size). However, we didn't have the
resources to revisit the fundamental architecture that Kowari had been built
on.

Now that we are embarking on XA2, we have been able to look at these
fundamental choices again. It doesn't take much to realize that you don't
need to reuse identifiers if you are in a 64 bit space. (Presume you get 1
new identifier every millisecond. You can keep allocating new resources for
over 580 million years before you run out). So we can allocate new
identifiers by incrementing a counter. We still need to be careful about
allocating file space, but a lot of work we were doing with FreeLists can
now go away. That means less disk space, less complexity, and fewer disk
seeks. We can also be confident that internal IDs for blank nodes will never
be reused.

The differences for the new architecture go WAY beyond the dropping of
FreeLists, but even this small change is going to give us big benefits.

Regards,
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mulgara.org/pipermail/mulgara-dev/attachments/20080327/1a8691ee/attachment.htm>