[Mulgara-general] Loading from files and transactions
Andrae Muys
andrae at netymon.com
Fri Mar 14 00:38:17 UTC 2008
On 14/03/2008, at 6:19 AM, Alex Hall wrote:
> I'm working on an application that will use Mulgara for two types of
> operations on RDF graphs. First, we will have relatively small
> updates
> (inserts or deletes involving a handful of statements on existing
> graphs) that need to execute as an atomic operation. Second, we will
> have loads of potentially very large RDF files into new graphs. There
> will be multiple web-based clients which can initiate either type of
> operation, although the first will be much more common than the
> second.
>
> Given these factors, the single write transaction constraint
> imposed by
> Mulgara is going to be a big challenge for us. I don't mind being
> limited to a single small update at a time since these generally
> execute
> in a matter of seconds. However, loading RDF from a file into a graph
> can take minutes or hours, during which time all other write
> operations
> will be locked out. It would be nice if we could stage our loads so
> that updates could continue to execute at the same time. However, the
> workarounds I've come up with to support such a scenario seem so
> kludgy
> that I'm almost embarrassed to have thought of them.
>
> Obviously, what I'd really like is support for multiple write
> transactions at once. In the meantime, I'm wondering if anybody else
> has encountered a similar situation, or can suggest an alternative
> approach to this problem.
Unfortunately at the current time we don't have the resources to get
concurrent writers implemented this year. In the interim the only
thing I can suggest is that you fake it. What I suspect you need to
do is to journal your small writes if they are blocked by a bulk
load. That probably means using two mulgara instances, one for the
main store, and one to track disjoint insert/delete sets on each
model. Run everything through a resolver that overrides createModel
to create ?ins and ?del models on the journal-instance; and updates
the result of every call to resolve with the required append/minus
operations. Run a background thread that periodically grabs the
write-lock *on-both-servers*, and does some incremental synching.
This will give you limited fake concurrent writes - hopefully it
won't be needed for long. Topaz has generously offered to fund the
full-time development of a new string-pool[1] that will both scale
several orders of magnitude better than the current one, and support
concurrent-writes. After that we need to build a replacement
statement-store that will also support the same scalability (and of
course concurrent-writes). At that point the last thing necessary to
support concurrent-writes is to implement a transaction scheduler
that will manage them.
Andrae
--
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd
More information about the Mulgara-general
mailing list