[Mulgara-dev] Exception stack trace while running a backup

Mon May 14 15:00:33 UTC 2007

On 15/05/2007, at 12:14 AM, David Moll wrote:
> Great news, we have moved production off of TKS2.1 and on to  
> Mulgara!  The change was made on Tuesday, and so far it has been  
> working much better than TKS.
That's great news.
> I’m watching the logging very closely to see if any problems pop  
> up, and I had a question about one occurance.  (As an aside, I am  
> no longer on the project that is using Mulgara on production.  I am  
> working on a separate project and still utilizing Mulgara.  Ben  
> Hysell, who has made several appearances on this list, is the lead  
> for production.)
>  We run automated backups every 4 hours, and Wednesday one of them  
> failed.  We are running Mulgara with the following command line  
> options:
Is this an isolated occurrence?  By code-inspection I am reasonably  
confident this could only be the result of a race-condition in the  
cleanup code.  Two threads would have had to call release() on the  
same AVLNode simultaneously.  This shouldn't be possible, but if it  
is then it would correspond to two query results being closed  
simultaneously.  Closing simultaneously isn't a problem, that code  
path should have synchronized before reaching the release().

Bad news: This is likely to be a hard to reproduce concurrency bug in  
the lowest levels of Mulgara.
Good news: It's likely to be a hard to reproduce concurrency bug in  
the cleanup of a read-only code-path.

That's all you get for 15 minutes - more than that will require more  
time :).

For anyone who's interested the trimmed stacktrace is:

Caused by: java.lang.AssertionError
	at org.mulgara.store.xa.Block.dispose(Block.java:524)
	at org.mulgara.store.xa.Block.release(Block.java:512)
	at org.mulgara.store.xa.AVLNode.release(AVLNode.java:1030)
	at org.mulgara.store.stringpool.xa.XAStringPoolImpl$Phase 
$GNodeTuplesImpl.close(XAStringPoolImpl.java:2728)
	at org.mulgara.resolver.BackupOperation.backupDatabase 
(BackupOperation.java:196)

everything else is just context/noise.
> What’s especially interesting to me about this error is that the  
> stack trace was printed twice, once at 08:29:09, and again at  
> 08:29:24.  The only difference between the two traces is the  
> timestamp.  I am at a loss as to why this is happening.  The only  
> thing that I can think of is that an insert was attempted while the  
> backup was being processed, and the insert was finalized, causing  
> instance of the tree being backed up to be lost.  That doesn’t seem  
> likely, as I would assume that a backup process would get its own  
> phased copy of the node tree to work on while the rest of the  
> system moved forward.  Any comments or suggestions are welcome.
>
> Exception stack trace while running a backup
Can't explain the double printing, unless it was a unified log file  
once for the server end and once for the client?

Andrae

-- 
Andrae Muys
andrae at netymon.com
Principal Mulgara Consultant
Netymon Pty Ltd