[Mulgara-dev] CRITICAL: Bug fix to Backup operation
Ben Hysell
BenH at viewpointusa.com
Mon Mar 31 15:33:29 UTC 2008
Andrae,
Easy ones first:
>> 4. In my attached backup file I actually can't load it in the present
>> form, when I do I get the following Java error on win2k3 server dual
>> cpu, quad cores with 8 Gb ram...its an interesting error that
>> appears to
>> be because of the very large node numbers listed in the backup,
>> thoughts? (I can get it to load if I make the node numbers small):
>This is a 32-bit jvm you're trying to restore to, right?
We are working with a 64-bit jvm.
>I must admit that growth is very high - you are running a version with
the fix to ticket #46 (I believe that is >revision 223).
We are currently on 590
>Here is how you could find yourself inserting "_node####".
I agree that could happen with a system, would this also happen with
data that is considered stale? i.e. we often loose numbers and dates
(non datatyped in Mulgara). So for a file we'll have
Created date: xx-xx-xx
Modified date: xx-xx-xx
File length: ##
Ect.
Once these elements are created they are not manipulated again by doing
select/inserts. However more data may come into the system that could
have the same dates/file lengths, when they get inserted at a later date
could they be corrupted then?
When debugging this problem we are able to use the WebUI to query the
database on the 'damaged' mulgara and not run into the scenario you
described. It is only after we do a restore do we start getting the
'_node####' in the WebUI.
While running on the damaged Mulgara and debugging we see the literals
are being pulled from the gn2spoCache. I believe you are correct with
the localization/globalization indices falling out of sync, below is a
long thread looking at this issue. Thoughts?
-----Original Message-----
From: Ben Hysell
Sent: Friday, January 25, 2008 10:14 AM
To: Mulgara Developers
Subject: RE: [Mulgara-dev] literal in gn2spoCache but cannot be found
inbackup file
Andre,
I've taken the following steps
1. shut down Mulgara
2. Start it back up
3. Opened a new webui
4. Run the query $s $p 'literal -> I would have expected:
<uri1> <uri2> literal
<uri1> <uri3> literal
<uri1> <uri4> literal
But only received
<uri1> <uri4> literal
This is the triple Paul asked me to insert yesterday during a debug
session in Eclipse.
5. Run the query <uri1> $p $o -> All three instances of the literal are
returned
Sorry if I was unclear in my earlier emails, I've been working off of
the original store the entire time. The Mulgara where I restored the
backup has been sitting off to the side and I have not touched it.
-ben
-----Original Message-----
From: mulgara-dev-bounces at mulgara.org
[mailto:mulgara-dev-bounces at mulgara.org] On Behalf Of Andrae Muys
Sent: Friday, January 25, 2008 1:13 AM
To: Mulgara Developers
Subject: Re: [Mulgara-dev] literal in gn2spoCache but cannot be found
inbackup file
On 24/01/2008, at 5:47 AM, Ben Hysell wrote:
> However, if I query $subject $predicate literal on line 555 of
> StringPoolSession.java in function localizeSPObject
>
> Long localNode = persistentStringPool.findGNode(relativeSPObject);
>
> Sets localNode = 0. As the function progresses and checks
> temporaryStringPool it also cannot find the node in there. The
> function finishes by creating a node in the temporary string pool.
> When we do arrive at ConstraintImpl on line 95 of
> ConstraintImpl.java the ConstraintElement e2 has a value of -1.
>
> So to circle back, and please correct me if I am wrong in my
> conclusions:
>
> 1. I can back up the good Mulgara server, but when I restore
> it the new Mulgara server has lots of blank nodes.
>
> 2. It appears during the backup operation on the good Mulgara
> server the call: Tuples t = stringPool.findGNodes(null,
> null); ::line 179 of BackupOperation.java truly does not have my
> literal in the string pool.
>
> 3. My literal is still in the system if I query for it and
> pull the node from the gn2spoCache
>
Ben,
Against the original store the backup was created from, could you
run a query that uses the literal directly? Preferably from a fresh
session (or even a clean restart of the server) to ensure the cache
is flushed? Something along the lines of:
select $s $p from <> where $s $p 'the literal';
This will force the literal to be localized, which is the operation
that uses the AVL tree. Any prior query that returns the literal may
end up populating the cache and avoiding the AVL-tree lookup. I'm
hoping this works fine, because if it doesn't then it isn't a problem
with backup, but rather the string-pool localization and
globalization indices have fallen out of sync.
The reason I ask is that if the localization you discuss above was
done on the original store (which, is unclear to me) then this is
what I suspect has happened. Returning a -ve node from a
localization means that the global resource (uri/literal) was not
found in the string-pool localization index and was allocated a
temporary node-id from the temporary string-pool. The -1 simply
means it is the first node to be allocated from the temporary pool.
When doing globalizations we simply use the sign-bit to determine
which string-pool to look it up in. We also use the sign-bit to
guarantee that every resource inserted into the statement-store has
first been persisted in the string-pool.
Andrae
-----Original Message-----
From: mulgara-dev-bounces at mulgara.org
[mailto:mulgara-dev-bounces at mulgara.org] On Behalf Of Andrae Muys
Sent: Monday, March 31, 2008 1:18 AM
To: Mulgara Developers
Subject: Re: [Mulgara-dev] CRITICAL: Bug fix to Backup operation
On 29/03/2008, at 5:40 AM, Ben Hysell wrote:
> We have however ended up with the '## "_node######"' in our backup
> files...I am back to square one on how these actually get into the
> backup file. In reference to these "_node###" that do eventually show
> up...we are not inserting them, they used to reference real data until
> somewhere something 'happens' and the backups start spitting out
> "_node####" in the backup file.
If I had to guess I would suggest that you are in fact inserting them
- let me explain.
Assuming for the moment that there is another bug somewhere that is
causing nodes in the statement-store (ie. the TRIPLES section) to
lose their corresponding entry in the globalization-index (ie. the
RDF NODES section); but to retain their entry in the localization-
index (which is not backed up as it is _supposed_ to be a duplicate
of the globalization-index, only in the opposite direction).
Here is how you could find yourself inserting "_node####".
1. Assume <foo:A> localizes to gn42, but the corresponding
globalization entry has been lost.
2. Perform a query that returns a list of nodes you wish to insert
statements about:
select $uris from ... where $uris <pred> "fred" -> { gn42, gn55, gn72 }
note that the localization-index contains:
<foo:A> -> gn42
<foo:B> -> gn55
<foo:C> -> gn72
but due to the bug in question the globalization-index only contains:
gn55 -> <foo:B>
gn72 -> <foo:C>
so when the above query is globalized and returned you get:
{ BlankNode(42), URI(foo:B), URI(foo:C) }
3. Use toString() to build an insert query from the global objects
returned by the query.
BlankNode(42).toString() -> "_node42".
URI(foo:B).toString() -> "foo:B"
URI(foo:C).toString() -> "foo:C"
4. Some code that is written to assume only URI's or Literals does
the relevant escaping:
insert <foo:D> <pred> '_node42'
<foo:D> <pred> <foo:B>
<foo:D> <pred> <foo:C> into ...
5. We have managed to convert an errant query result into the
insertion of a "_node####" string literal.
> I'm not sure if this is a time issue, a
> transaction issue, or both. The size of our server1 directory
> starts at
> 5.5 GB, we often start having problems when it grows and always
> perform
> a restore before we hit 20 GB. Our longest stretch is two weeks
> before
> we have either hit the 20 GB mark or we are starting to see blank
> nodes
> in the backup file, shorter if we are doing a lot of transactions.
I must admit that growth is very high - you are running a version
with the fix to ticket #46 (I believe that is revision 223).
I suspect it isn't a transaction issue - my fear is that we are
talking about a race-condition. To start with it is probably worth
checking to make sure we definitely are seeing the localize/globalize
index fall out of sync. That would entail going through all the
'new' blank-nodes, and making sure they still have a legitimate entry
in the localization-index. To do this I am going to have to
temporarily add a new operation to your Session so we can do the check.
> The issues we are still tracking:
> 1. how do "_node###" ever get into the backup file
As discussed above, I suspect the bug causing 3 is tricking you into
inserting them.
> 2. Duplicate string pool entries
Still not sure about this one.
> 3. why are we getting blank nodes in our backup file to begin
> with. We
> can still query the running Mulgara to find the data, but we can't
> back
> it up.
This does suggest that the indices are falling out of sync. Does
this mean you know the global data-values that are supposed to be
there? If so this a) makes confirming this much much simpler; and b)
this means it is feasible for us to consider repairing the
globalization-index and/or repairing the backup.
> 4. In my attached backup file I actually can't load it in the present
> form, when I do I get the following Java error on win2k3 server dual
> cpu, quad cores with 8 Gb ram...its an interesting error that
> appears to
> be because of the very large node numbers listed in the backup,
> thoughts? (I can get it to load if I make the node numbers small):
This is a 32-bit jvm you're trying to restore to, right?
Andrae
--
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd
_______________________________________________
Mulgara-dev mailing list
Mulgara-dev at mulgara.org
http://mulgara.org/mailman/listinfo/mulgara-dev
More information about the Mulgara-dev
mailing list