[Mulgara-general] Loading from files and transactions

Fri Mar 14 00:38:17 UTC 2008

On 14/03/2008, at 6:19 AM, Alex Hall wrote:

> I'm working on an application that will use Mulgara for two types of
> operations on RDF graphs.  First, we will have relatively small  
> updates
> (inserts or deletes involving a handful of statements on existing
> graphs) that need to execute as an atomic operation.  Second, we will
> have loads of potentially very large RDF files into new graphs.  There
> will be multiple web-based clients which can initiate either type of
> operation, although the first will be much more common than the  
> second.
>
> Given these factors, the single write transaction constraint  
> imposed by
> Mulgara is going to be a big challenge for us.  I don't mind being
> limited to a single small update at a time since these generally  
> execute
> in a matter of seconds.  However, loading RDF from a file into a graph
> can take minutes or hours, during which time all other write  
> operations
> will be locked out.  It would be nice if we could stage our loads so
> that updates could continue to execute at the same time.  However, the
> workarounds I've come up with to support such a scenario seem so  
> kludgy
> that I'm almost embarrassed to have thought of them.
>
> Obviously, what I'd really like is support for multiple write
> transactions at once.  In the meantime, I'm wondering if anybody else
> has encountered a similar situation, or can suggest an alternative
> approach to this problem.

Unfortunately at the current time we don't have the resources to get  
concurrent writers implemented this year.  In the interim the only  
thing I can suggest is that you fake it.  What I suspect you need to  
do is to journal your small writes if they are blocked by a bulk  
load.  That probably means using two mulgara instances, one for the  
main store, and one to track disjoint insert/delete sets on each  
model.  Run everything through a resolver that overrides createModel  
to create ?ins and ?del models on the journal-instance; and updates  
the result of every call to resolve with the required append/minus  
operations.  Run a background thread that periodically grabs the  
write-lock *on-both-servers*, and does some incremental synching.

This will give you limited fake concurrent writes - hopefully it  
won't be needed for long.  Topaz has generously offered to fund the  
full-time development of a new string-pool[1] that will both scale  
several orders of magnitude better than the current one, and support  
concurrent-writes.  After that we need to build a replacement  
statement-store that will also support the same scalability (and of  
course concurrent-writes).  At that point the last thing necessary to  
support concurrent-writes is to implement a transaction scheduler  
that will manage them.

Andrae

-- 
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd