[Mulgara-dev] CRITICAL: Bug fix to Backup operation

Mon Mar 31 15:33:29 UTC 2008

Andrae,

Easy ones first:

>> 4. In my attached backup file I actually can't load it in the present
>> form, when I do I get the following Java error on win2k3 server dual
>> cpu, quad cores with 8 Gb ram...its an interesting error that  
>> appears to
>> be because of the very large node numbers listed in the backup,
>> thoughts? (I can get it to load if I make the node numbers small):

>This is a 32-bit jvm you're trying to restore to, right?

We are working with a 64-bit jvm.

>I must admit that growth is very high - you are running a version with
the fix to ticket #46 (I believe that is >revision 223).

We are currently on 590

>Here is how you could find yourself inserting "_node####".

I agree that could happen with a system, would this also happen with
data that is considered stale?  i.e. we often loose numbers and dates
(non datatyped in Mulgara).  So for a file we'll have

Created date: xx-xx-xx
Modified date: xx-xx-xx
File length: ##

Ect.

Once these elements are created they are not manipulated again by doing
select/inserts.  However more data may come into the system that could
have the same dates/file lengths, when they get inserted at a later date
could they be corrupted then?

When debugging this problem we are able to use the WebUI to query the
database on the 'damaged' mulgara and not run into the scenario you
described.  It is only after we do a restore do we start getting the
'_node####' in the WebUI.  

While running on the damaged Mulgara and debugging we see the literals
are being pulled from the gn2spoCache. I believe you are correct with
the localization/globalization indices falling out of sync, below is a
long thread looking at this issue.  Thoughts?

-----Original Message-----
From: Ben Hysell 
Sent: Friday, January 25, 2008 10:14 AM
To: Mulgara Developers
Subject: RE: [Mulgara-dev] literal in gn2spoCache but cannot be found
inbackup file

Andre,

I've taken the following steps 

1. shut down Mulgara
2. Start it back up
3. Opened a new webui
4. Run the query $s $p 'literal -> I would have expected:

<uri1> <uri2> literal
<uri1> <uri3> literal 
<uri1> <uri4> literal

But only received

<uri1> <uri4> literal

This is the triple Paul asked me to insert yesterday during a debug
session in Eclipse.

5. Run the query <uri1> $p $o -> All three instances of the literal are
returned

Sorry if I was unclear in my earlier emails, I've been working off of
the original store the entire time.  The Mulgara where I restored the
backup has been sitting off to the side and I have not touched it.

-ben 

-----Original Message-----
From: mulgara-dev-bounces at mulgara.org
[mailto:mulgara-dev-bounces at mulgara.org] On Behalf Of Andrae Muys
Sent: Friday, January 25, 2008 1:13 AM
To: Mulgara Developers
Subject: Re: [Mulgara-dev] literal in gn2spoCache but cannot be found
inbackup file

On 24/01/2008, at 5:47 AM, Ben Hysell wrote:
> However, if I query $subject $predicate literal on line 555 of  
> StringPoolSession.java in function localizeSPObject
>
> Long localNode = persistentStringPool.findGNode(relativeSPObject);
>
> Sets localNode = 0.  As the function progresses and checks  
> temporaryStringPool it also cannot find the node in there.  The  
> function finishes by creating a node in the temporary string pool.  
> When we do arrive at ConstraintImpl on line 95 of  
> ConstraintImpl.java the ConstraintElement e2 has a value of -1.
>
> So to circle back, and please correct me if I am wrong in my  
> conclusions:
>
> 1.       I can back up the good Mulgara server, but when I restore  
> it the new Mulgara server has lots of blank nodes.
>
> 2.       It appears during the backup operation on the good Mulgara  
> server the call: Tuples t = stringPool.findGNodes(null,  
> null); ::line 179 of BackupOperation.java truly does not have my  
> literal in the string pool.
>
> 3.       My literal is still in the system if I query for it and  
> pull the node from the gn2spoCache
>
Ben,
   Against the original store the backup was created from, could you  
run a query that uses the literal directly?  Preferably from a fresh  
session (or even a clean restart of the server) to ensure the cache  
is flushed?   Something along the lines of:

select $s $p from <> where $s $p 'the literal';

This will force the literal to be localized, which is the operation  
that uses the AVL tree.  Any prior query that returns the literal may  
end up populating the cache and avoiding the AVL-tree lookup.  I'm  
hoping this works fine, because if it doesn't then it isn't a problem  
with backup, but rather the string-pool localization and  
globalization indices have fallen out of sync.

The reason I ask is that if the localization you discuss above was  
done on the original store (which, is unclear to me) then this is  
what I suspect has happened.  Returning a -ve node from a  
localization means that the global resource (uri/literal) was not  
found in the string-pool localization index and was allocated a  
temporary node-id from the temporary string-pool.  The -1 simply  
means it is the first node to be allocated from the temporary pool.

When doing globalizations we simply use the sign-bit to determine  
which string-pool to look it up in.  We also use the sign-bit to  
guarantee that every resource inserted into the statement-store has  
first been persisted in the string-pool.

Andrae

-----Original Message-----
From: mulgara-dev-bounces at mulgara.org
[mailto:mulgara-dev-bounces at mulgara.org] On Behalf Of Andrae Muys
Sent: Monday, March 31, 2008 1:18 AM
To: Mulgara Developers
Subject: Re: [Mulgara-dev] CRITICAL: Bug fix to Backup operation

On 29/03/2008, at 5:40 AM, Ben Hysell wrote:
> We have however ended up with the '## "_node######"' in our backup
> files...I am back to square one on how these actually get into the
> backup file.  In reference to these "_node###" that do eventually show
> up...we are not inserting them, they used to reference real data until
> somewhere something 'happens' and the backups start spitting out
> "_node####" in the backup file.

If I had to guess I would suggest that you are in fact inserting them  
- let me explain.

Assuming for the moment that there is another bug somewhere that is  
causing nodes in the statement-store (ie. the TRIPLES section) to  
lose their corresponding entry in the globalization-index (ie. the  
RDF NODES section); but to retain their entry in the localization- 
index (which is not backed up as it is _supposed_ to be a duplicate  
of the globalization-index, only in the opposite direction).

Here is how you could find yourself inserting "_node####".

1. Assume <foo:A> localizes to gn42, but the corresponding  
globalization entry has been lost.

2. Perform a query that returns a list of nodes you wish to insert  
statements about:

select $uris from ... where $uris <pred> "fred" -> { gn42, gn55, gn72 }
   note that the localization-index contains:
     <foo:A> -> gn42
     <foo:B> -> gn55
     <foo:C> -> gn72
   but due to the bug in question the globalization-index only contains:
     gn55 -> <foo:B>
     gn72 -> <foo:C>
   so when the above query is globalized and returned you get:
     { BlankNode(42), URI(foo:B), URI(foo:C) }

3. Use toString() to build an insert query from the global objects  
returned by the query.
     BlankNode(42).toString() -> "_node42".
     URI(foo:B).toString() -> "foo:B"
     URI(foo:C).toString() -> "foo:C"

4. Some code that is written to assume only URI's or Literals does  
the relevant escaping:
    insert <foo:D> <pred> '_node42'
           <foo:D> <pred> <foo:B>
           <foo:D> <pred> <foo:C> into ...

5. We have managed to convert an errant query result into the  
insertion of a "_node####" string literal.

>   I'm not sure if this is a time issue, a
> transaction issue, or both.  The size of our server1 directory  
> starts at
> 5.5 GB, we often start having problems when it grows and always  
> perform
> a restore before we hit 20 GB.  Our longest stretch is two weeks  
> before
> we have either hit the 20 GB mark or we are starting to see blank  
> nodes
> in the backup file, shorter if we are doing a lot of transactions.

I must admit that growth is very high - you are running a version  
with the fix to ticket #46 (I believe that is revision 223).

I suspect it isn't a transaction issue - my fear is that we are  
talking about a race-condition.  To start with it is probably worth  
checking to make sure we definitely are seeing the localize/globalize  
index fall out of sync.  That would entail going through all the  
'new' blank-nodes, and making sure they still have a legitimate entry  
in the localization-index.  To do this I am going to have to  
temporarily add a new operation to your Session so we can do the check.

> The issues we are still tracking:
> 1. how do "_node###" ever get into the backup file

As discussed above, I suspect the bug causing 3 is tricking you into  
inserting them.

> 2. Duplicate string pool entries

Still not sure about this one.

> 3. why are we getting blank nodes in our backup file to begin  
> with.  We
> can still query the running Mulgara to find the data, but we can't  
> back
> it up.

This does suggest that the indices are falling out of sync.  Does  
this mean you know the global data-values that are supposed to be  
there?  If so this a) makes confirming this much much simpler; and b)  
this means it is feasible for us to consider repairing the  
globalization-index and/or repairing the backup.

> 4. In my attached backup file I actually can't load it in the present
> form, when I do I get the following Java error on win2k3 server dual
> cpu, quad cores with 8 Gb ram...its an interesting error that  
> appears to
> be because of the very large node numbers listed in the backup,
> thoughts? (I can get it to load if I make the node numbers small):

This is a 32-bit jvm you're trying to restore to, right?

Andrae

-- 
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd

_______________________________________________
Mulgara-dev mailing list
Mulgara-dev at mulgara.org
http://mulgara.org/mailman/listinfo/mulgara-dev