[Mulgara-dev] CRITICAL: Bug fix to Backup operation

Andrae Muys andrae at netymon.com
Mon Mar 31 05:18:25 UTC 2008


On 29/03/2008, at 5:40 AM, Ben Hysell wrote:
> We have however ended up with the '## "_node######"' in our backup
> files...I am back to square one on how these actually get into the
> backup file.  In reference to these "_node###" that do eventually show
> up...we are not inserting them, they used to reference real data until
> somewhere something 'happens' and the backups start spitting out
> "_node####" in the backup file.

If I had to guess I would suggest that you are in fact inserting them  
- let me explain.

Assuming for the moment that there is another bug somewhere that is  
causing nodes in the statement-store (ie. the TRIPLES section) to  
lose their corresponding entry in the globalization-index (ie. the  
RDF NODES section); but to retain their entry in the localization- 
index (which is not backed up as it is _supposed_ to be a duplicate  
of the globalization-index, only in the opposite direction).

Here is how you could find yourself inserting "_node####".

1. Assume <foo:A> localizes to gn42, but the corresponding  
globalization entry has been lost.

2. Perform a query that returns a list of nodes you wish to insert  
statements about:

select $uris from ... where $uris <pred> "fred" -> { gn42, gn55, gn72 }
   note that the localization-index contains:
     <foo:A> -> gn42
     <foo:B> -> gn55
     <foo:C> -> gn72
   but due to the bug in question the globalization-index only contains:
     gn55 -> <foo:B>
     gn72 -> <foo:C>
   so when the above query is globalized and returned you get:
     { BlankNode(42), URI(foo:B), URI(foo:C) }

3. Use toString() to build an insert query from the global objects  
returned by the query.
     BlankNode(42).toString() -> "_node42".
     URI(foo:B).toString() -> "foo:B"
     URI(foo:C).toString() -> "foo:C"

4. Some code that is written to assume only URI's or Literals does  
the relevant escaping:
    insert <foo:D> <pred> '_node42'
           <foo:D> <pred> <foo:B>
           <foo:D> <pred> <foo:C> into ...

5. We have managed to convert an errant query result into the  
insertion of a "_node####" string literal.

>   I'm not sure if this is a time issue, a
> transaction issue, or both.  The size of our server1 directory  
> starts at
> 5.5 GB, we often start having problems when it grows and always  
> perform
> a restore before we hit 20 GB.  Our longest stretch is two weeks  
> before
> we have either hit the 20 GB mark or we are starting to see blank  
> nodes
> in the backup file, shorter if we are doing a lot of transactions.

I must admit that growth is very high - you are running a version  
with the fix to ticket #46 (I believe that is revision 223).

I suspect it isn't a transaction issue - my fear is that we are  
talking about a race-condition.  To start with it is probably worth  
checking to make sure we definitely are seeing the localize/globalize  
index fall out of sync.  That would entail going through all the  
'new' blank-nodes, and making sure they still have a legitimate entry  
in the localization-index.  To do this I am going to have to  
temporarily add a new operation to your Session so we can do the check.

> The issues we are still tracking:
> 1. how do "_node###" ever get into the backup file

As discussed above, I suspect the bug causing 3 is tricking you into  
inserting them.

> 2. Duplicate string pool entries

Still not sure about this one.

> 3. why are we getting blank nodes in our backup file to begin  
> with.  We
> can still query the running Mulgara to find the data, but we can't  
> back
> it up.

This does suggest that the indices are falling out of sync.  Does  
this mean you know the global data-values that are supposed to be  
there?  If so this a) makes confirming this much much simpler; and b)  
this means it is feasible for us to consider repairing the  
globalization-index and/or repairing the backup.

> 4. In my attached backup file I actually can't load it in the present
> form, when I do I get the following Java error on win2k3 server dual
> cpu, quad cores with 8 Gb ram...its an interesting error that  
> appears to
> be because of the very large node numbers listed in the backup,
> thoughts? (I can get it to load if I make the node numbers small):

This is a 32-bit jvm you're trying to restore to, right?

Andrae

-- 
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd





More information about the Mulgara-dev mailing list