[Mulgara-dev] Backups failing Mulgara 1.2 Windows

Paul Gearon gearon at ieee.org
Wed Apr 9 21:02:34 UTC 2008


On Apr 9, 2008, at 2:10 PM, Alex Hall wrote:
> Ben Hysell wrote:
>> Before the patch 98% pass, after 82.98, 80, 83.31 (I'm getting  
>> different Success rates on subsequent runs)
>>
>> Thoughts?
>
> Sure...  My thoughts are, we're doing weird stuff with the garbage
> collector and filesystem that developers like us probably weren't  
> meant
> to do.  Paul and Andrae can probably list a whole host of reasons why
> applying that patch is a bad idea.  I only suggested that it might fix
> the backup/restore problem, not that it wouldn't break other  
> things ;-)

My thoughts are that it reminds me why I don't run Windows. Mind you,  
that doesn't help the majority of people out there who want to.

The patch isn't a bad idea. We've already had to hack solutions for  
Windows and file mapping.

The issue in using file mapping is that occasionally we need to change  
the size of a file. This means re-mapping it. However, the  
MemoryMappedBuffer in Java doesn't do any kind of checking when you  
call get() or put(), as this would slow down operations that were  
specifically designed for optimizing access. So the only way that Sun  
could make this work was to make sure that MemoryMappedBuffer is  
always valid. Otherwise, without a validity check they would be  
getting GPFs or segfaults (depending on your OS), which is explicitly  
disallowed in Java.

MemoryMappedBuffer can both lock a file and tie up a large section of  
address space (on a 32 bit system). And since it doesn't have a close  
method, we had to do a trick to force it to close. The way this was  
done was to set all references to it to null, and call the gc. This  
works perfectly on Linux, Solaris and OS X. But on Windows it  
occasionally failed. So if it failed on Windows we called it a second  
time. But every few weeks we'd get a bug report saying that it still  
failed. So then David Makepeace wrote a loop which called the garbage  
collector up to 10 times (deciding that if it still hadn't unmapped by  
that point, then it wasn't going to). This appeared to fix the problem  
on Windows..... until now.

Alex's workaround is FAR more elegant than trying to coax the GC into  
doing it for us. So to reiterate my point, I don't think it's a bad  
idea at all.  :-)

> All kidding aside, the fact that the success rates vary for different
> runs suggests that it's a timing issue.  This isn't very surprising
> given that we're mucking around with the internal workings of the
> garbage collector.  When the tests don't pass, is it due to failed
> assertions or to exceptions being thrown?  Are there any stack traces
> you can share?

If the GC is throwing an exception then I'd expect that test to bomb  
out. By default, Ant usually stops processing when it encounters a  
failed process, so I would have thought the tests would simply stop  
with an error, rather than continuing through to the final report.

Paul



More information about the Mulgara-dev mailing list