[Mulgara-dev] Backups failing Mulgara 1.2 Windows
Paul Gearon
gearon at ieee.org
Wed Apr 9 21:02:34 UTC 2008
On Apr 9, 2008, at 2:10 PM, Alex Hall wrote:
> Ben Hysell wrote:
>> Before the patch 98% pass, after 82.98, 80, 83.31 (I'm getting
>> different Success rates on subsequent runs)
>>
>> Thoughts?
>
> Sure... My thoughts are, we're doing weird stuff with the garbage
> collector and filesystem that developers like us probably weren't
> meant
> to do. Paul and Andrae can probably list a whole host of reasons why
> applying that patch is a bad idea. I only suggested that it might fix
> the backup/restore problem, not that it wouldn't break other
> things ;-)
My thoughts are that it reminds me why I don't run Windows. Mind you,
that doesn't help the majority of people out there who want to.
The patch isn't a bad idea. We've already had to hack solutions for
Windows and file mapping.
The issue in using file mapping is that occasionally we need to change
the size of a file. This means re-mapping it. However, the
MemoryMappedBuffer in Java doesn't do any kind of checking when you
call get() or put(), as this would slow down operations that were
specifically designed for optimizing access. So the only way that Sun
could make this work was to make sure that MemoryMappedBuffer is
always valid. Otherwise, without a validity check they would be
getting GPFs or segfaults (depending on your OS), which is explicitly
disallowed in Java.
MemoryMappedBuffer can both lock a file and tie up a large section of
address space (on a 32 bit system). And since it doesn't have a close
method, we had to do a trick to force it to close. The way this was
done was to set all references to it to null, and call the gc. This
works perfectly on Linux, Solaris and OS X. But on Windows it
occasionally failed. So if it failed on Windows we called it a second
time. But every few weeks we'd get a bug report saying that it still
failed. So then David Makepeace wrote a loop which called the garbage
collector up to 10 times (deciding that if it still hadn't unmapped by
that point, then it wasn't going to). This appeared to fix the problem
on Windows..... until now.
Alex's workaround is FAR more elegant than trying to coax the GC into
doing it for us. So to reiterate my point, I don't think it's a bad
idea at all. :-)
> All kidding aside, the fact that the success rates vary for different
> runs suggests that it's a timing issue. This isn't very surprising
> given that we're mucking around with the internal workings of the
> garbage collector. When the tests don't pass, is it due to failed
> assertions or to exceptions being thrown? Are there any stack traces
> you can share?
If the GC is throwing an exception then I'd expect that test to bomb
out. By default, Ant usually stops processing when it encounters a
failed process, so I would have thought the tests would simply stop
with an error, rather than continuing through to the final report.
Paul
More information about the Mulgara-dev
mailing list