[Mulgara-dev] (no subject)
Andrae Muys
andrae at netymon.com
Sat Jan 27 06:45:47 UTC 2007
On 27/01/2007, at 2:10 PM, Life is hard, and then you die wrote:
> we have a question about disk usage. We have a database with around
> 1.4 million triples currently, and the disk usage looks as follows:
>
> 4.0K lucene
> 1.7M xaNodePool
> 6.0G xaStatementStore
> 169M xaStringPool
>
> (A detailed listing of file sizes is at the end). While the string
> pool looks fine, the statement store looks a bit large.
>
> Now, we have two calculations for the statement store, one from Paul
> and one from Andrae. Given N statements, Paul said basically the disk
> usage should be around N / 192 * 8292 * 6; Andrae said something like
> 12 * 32 * N. This comes to about 363MB and 538MB, respectively, i.e.
> same ballbark (I'm not interested in exact numbers). But both are an
> order of magnitude less than what we're seeing, though.
>
> Paul and Andrae mentioned that space is not reclaimed on deletes, but
> instead goes back into a pool. We don't have the exact numbers, but in
> our case about 1/5 of the triples got inserted (~240000), then about
> 6500 were removed, and then the rest of the triples were inserted.
> There have been some small sets of deletes since then, but nothing
> beyond a few thousand triples. So in total the deletes are < 1% of the
> inserts. Plus anything in the free pool from the deletes should've
> pretty much been used up by the following inserts. But even if not,
> this doesn't look it could account for the discrepency.
>
> So, I'm a bit curious: anybody have any idea why the large disk usage?
> Has anybody else seen this (ever 4K per statement)?
I just had a conversation with David Makepeace about this, and while
he was surprised he did offer a possible explanation. The sizes
below have a few interesting properties.
1. Most of the space is being consumed by the block-files (_tb), not
the AVLTrees.
2. There is a huge discrepancy between the 012 blockfile indices and
the 120 blockfile indices.
The theoretical 'ideal' blockfile should be 44.8e6 bytes (for 1.4e6
quads).
In practice the ideal block operates at a lower-bound of 50%
utilisation so 89.6e6 bytes.
There are 6 indices so the total ideal blockfile space required for
1.4e6 quads is 538MB.
This is quite a bit less then the 5.6GB being used below - so the
space is not a result of 'normal' operations. This suggests that the
space is either on the freelist, or supporting multiple phases. If
you were doing alot of deletes then that might suggest the freelist -
but the worst-case overhead for 6000 deletes (each hitting a
different 4k block) is 24MB. So the space must be the result of
phase-support.
Remember that mulgara does do copy-on-write to support multiversion
semantics, but it only retains the old copy only if there remains a
reference to the old phase (version). If there is no such reference,
then the old block is placed on the freelist and used to support
subsequent inserts. But there is one quirk in the freelist behaviour
(From FreeList.java)
* A fifo of integer items. A list of "phases" is maintained where
each phase
* represents a snapshot of the state of the free list at a point in
time. Items
* added to the free list will not be returned by {@link #allocate}
until all
* current phases have been closed (and at least one new phase created).
Note the "until all current phases have been closed". What this
means is that one unclosed outstanding Answer may (if it retains a
reference to a current phase) prevent deallocated blocks from being
reallocated!
So what I think is happening is this - somewhere you are holding an
Answer open over the course of numerous small inserts. I suspect
from the file-size distributions each insert is a set of properties
associated with only a few subjects. The result is that on the 012
sorted indices these inserts only hit one or two blocks (leading to
the 5.4x inefficiency we are currently seeing). In the case of 120
indices each property hits a separate block, forcing a duplicate,
which due to the outstanding phase is never reaped (leading to the
~39x inefficiency we are currently seeing). As an aside in practice
the goal is between 1.5x and 2x inefficiency, so that 5.4x is really
3x, and the 39x really 25x. With 201 a degree of aliasing between
'objects' provides some defense, but not enough to avoid an 18x (10x)
penalty.
It would be a good idea to confirm this theory, but if it holds then
there are a few options to proceed from here. I have listed them in
the order of preference, and incidentally both effort and mulgara
internals experience required.
1. Remember to call close() on your Answers - especially if you are
using the new transaction code which will allow you to keep an Answer
open concurrently with initiating a new transaction from its parent
Session.
2. Periodically close() your Sessions - this will force-close any
Answer objects that may have leaked. This could be done periodically
in any connection-pool.
3. Once you are confident you have no very-long-lived Answers, you
could possibly ignore the problem. Once the phases are released, the
space will be available for reallocation and the store's size will
stabilise.
4. Wrap either a new Tuples or new Answer object around the server-
side result that ages the Answer and eventually Materializes it,
closing the inner-result and clearing the phase.
5. Modify the FreeList behaviour to start reallocating blocks that
are not referenced by any previous phase (rather than blocks that
were referenced by a phase newer than the oldest active phase). This
requires substantial new bookkeeping and is a very non-trivial change.
Can you examine your test case and see if this is indeed what is
happening? If so, is there any particular reason you need to hold an
old Answer open over so many inserts?
Andrae
Note:
When estimating the space requirement above I had forgotten the AVL
tree. This is one node per block (4k) each node is 93bytes[0] (Paul,
can you confirm this). This means that the ideal AVLtree index is in
practice ~2MB.
[0] AVLNode:
Left-Node - long - 8
Right-Node - long - 8
Balance - byte - 1
Low-Quad - 4xlong - 32
High-Quad - 4xlong - 32
Nr-Quads - int - 4
BlockId - long - 8
= 93
> P.S. here's a detailed listing of the files by size:
>
> Files in xaStatementStore, sorted by size:
>
> 1744830464 xa.g_3120_tb
> 1744830464 xa.g_1203_tb
> 847249408 xa.g_3201_tb
> 838860800 xa.g_2013_tb
> 251658240 xa.g_0123_tb
> 243269632 xa.g_3012_tb
> 192757760 xa.g_3120
> 192757760 xa.g_1203
> 100573184 xa.g_3201
> 100573184 xa.g_2013
> 33529856 xa.g_3012
> 33529856 xa.g_0123
> 27623424 xa.g_1203_fl
> 27557888 xa.g_3120_fl
> 14680064 xa.g_2013_fl
> 13959168 xa.g_3201_fl
> 8388608 xa.g_3201_tb_fl_ph
> 8388608 xa.g_3201_fl_ph
> 8388608 xa.g_3120_tb_fl_ph
> 8388608 xa.g_3120_fl_ph
> 8388608 xa.g_3012_tb_fl_ph
> 8388608 xa.g_3012_fl_ph
> 8388608 xa.g_2013_tb_fl_ph
> 8388608 xa.g_2013_fl_ph
> 8388608 xa.g_1203_tb_fl_ph
> 8388608 xa.g_1203_fl_ph
> 8388608 xa.g_0123_tb_fl_ph
> 8388608 xa.g_0123_fl_ph
> 4194304 xa.g_0123_fl
> 4128768 xa.g_3012_fl
> 2949120 xa.g_1203_tb_fl
> 2916352 xa.g_3120_tb_fl
> 1409024 xa.g_3201_tb_fl
> 1409024 xa.g_2013_tb_fl
> 360448 xa.g_3012_tb_fl
> 360448 xa.g_0123_tb_fl
> 1088 xa.g
>
> Files in xaStringPool, sorted by size:
>
> 125714432 xa.sp_avl
> 33554432 xa.sp_nd
> 11927552 xa.sp_avl_fl
> 8388608 xa.sp_avl_fl_ph
> 8388608 xa.sp_08_fl_ph
> 8388608 xa.sp_08
> 8388608 xa.sp_07_fl_ph
> 8388608 xa.sp_07
> 8388608 xa.sp_06_fl_ph
> 8388608 xa.sp_06
> 8388608 xa.sp_05_fl_ph
> 8388608 xa.sp_05
> 8388608 xa.sp_04_fl_ph
> 8388608 xa.sp_04
> 8388608 xa.sp_03_fl_ph
> 8388608 xa.sp_03
> 8388608 xa.sp_02_fl_ph
> 8388608 xa.sp_02
> 8388608 xa.sp_01_fl_ph
> 8388608 xa.sp_01
> 8388608 xa.sp_00_fl_ph
> 8388608 xa.sp_00
> 65536 xa.sp_19_fl
> 65536 xa.sp_18_fl
> 65536 xa.sp_17_fl
> 65536 xa.sp_16_fl
> 65536 xa.sp_15_fl
> 65536 xa.sp_14_fl
> 65536 xa.sp_13_fl
> 65536 xa.sp_12_fl
> 65536 xa.sp_11_fl
> 65536 xa.sp_10_fl
> 65536 xa.sp_09_fl
> 65536 xa.sp_08_fl
> 65536 xa.sp_07_fl
> 65536 xa.sp_06_fl
> 65536 xa.sp_05_fl
> 65536 xa.sp_04_fl
> 65536 xa.sp_03_fl
> 65536 xa.sp_02_fl
> 65536 xa.sp_01_fl
> 65536 xa.sp_00_fl
> 1408 xa.sp
> 0 xa.sp.lock
> 0 xa.sp_19_fl_ph
> 0 xa.sp_18_fl_ph
> 0 xa.sp_17_fl_ph
> 0 xa.sp_16_fl_ph
> 0 xa.sp_15_fl_ph
> 0 xa.sp_14_fl_ph
> 0 xa.sp_13_fl_ph
> 0 xa.sp_12_fl_ph
> 0 xa.sp_11_fl_ph
> 0 xa.sp_10_fl_ph
> 0 xa.sp_09_fl_ph
> 0 xa.sp_19
> 0 xa.sp_18
> 0 xa.sp_17
> 0 xa.sp_16
> 0 xa.sp_15
> 0 xa.sp_14
> 0 xa.sp_13
> 0 xa.sp_12
> 0 xa.sp_11
> 0 xa.sp_10
> 0 xa.sp_09
>
> _______________________________________________
> Mulgara-dev mailing list
> Mulgara-dev at mulgara.org
> http://mulgara.org/mailman/listinfo/mulgara-dev
--
Andrae Muys
andrae at netymon.com
Principal Mulgara Consultant
Netymon Pty Ltd
More information about the Mulgara-dev
mailing list