[Mulgara-dev] CRITICAL: Bug fix to Backup operation
Andrae Muys
andrae at netymon.com
Wed Mar 26 05:46:52 UTC 2008
On 26/03/2008, at 7:43 AM, Ben Hysell wrote:
> We check for inconsistencies using the following methods with the
> decompressed backup file:
>
> 1. Read in the string pool and ensure there are not duplicate
> entries in
> the string pool, i.e. node numbers listed twice
>
> 2. Read through the string pool looking for any entries that start
> with
> _node followed by a number
As discussed below, this step should *never* happen - even if you
*are* using blank-nodes.
> 3. Look up each node number in the TRIPLES section to ensure there
> is a
> corresponding string pool entry for the node number in question.
>
> At one point someone had sent out on the list how to search your
> restored running Mulgara instance to check and see if any _node
> entries
> existed, however I've lost the email, and every time we ran the
> query we
> would crash Mulgara.
That would have involved using the NodeTypeResolver - one of Paul's
babies, he can probably reproduce the query.
> As for our testing:
>
> I took the server1 folder from production that was causing problems,
> copied over the new jar files, ran a backup and examined it using the
> steps from above.
>
> 1. I still have duplicate entries in the string pool, however this
> time
> they are grouped together, i.e. the one instance in this back up is
> node
> 6290, which is listed twice:
Ok, this is cute - in every case the entry truly is a duplicate
right? Same ID *and* same URI/Literal? We aren't talking about URI/
Literals being mapped to multiple IDs or visa-versa right?
> 2. There are no listings of _node in the backup file, this tells me
> the
> string pool is 'clean', where if I query the database I'll never
> return
> an entry that has _node followed by a large number.
Actually I suspect it doesn't. The backup file should never contain
any _node entries, these are identified as nodes in the TRIPLES
section that don't have corresponding entries in the RDFNODES
section. So in fact it is your item 3 that would tell you the string
pool is 'clean' - which apparently it is not.
> 3. There is roughly 17k triples that contain node numbers
> represented in
> the TRIPLES section that do not have corresponding string pool
> entries.
> If I restored this backup I would introduce 17k triples that would
> have
> one of the triples represented with _node.
This is a concern - can you verify that the server1 directory does
not contain any statements that contain blank-nodes? Specifically
can you query for the 17k triples you have identified to check to see
if they exist as blank-nodes in the store or just in the backup?
I need this so I know if the problem is with the backup code, or
somewhere else.
> 4. I did a backup of the same server1 with rev 570 with the following
> results:
>
> -there is roughly 17k node numbers in the TRIPLES section that do not
> have corresponding string pool entries
> -there are no _node string pool entries
> -there is a duplicate in the string pool, but as I put out in my
> original email the duplicate string pool entry is near the bottom
> of the
> backup file, included with the node IDs of 6427720.
>
>
> I'm still concerned we have a duplicate in the string pool entries and
> not all of the strings in the string pool are making it out to the
> backup file.
So am I, believe me.
Andrae.
--
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd
More information about the Mulgara-dev
mailing list