[Mulgara-general] My job, and SPARQL development

Paul Gearon gearon at ieee.org
Thu Feb 7 02:50:29 UTC 2008


On Feb 5, 2008, at 10:37 PM, Life is hard, and then you die wrote:

> On Tue, Feb 05, 2008 at 08:14:44PM -0600, Paul Gearon wrote:
>>
>> On Feb 5, 2008, at 6:46 PM, Life is hard, and then you die wrote:
<snip/>
>> This is because you aren't supposed to run more than one database  
>> in a
>> single system.  It is guaranteed to be slower.  This is one of the
>> reasons we allowed for multiple graphs.  If you really need it, then
>> you can run multiple servers on the one system.
>>
>> The only advantage I can think of for having multiple databases in  
>> one
>> JVM is to permit multiple writers while avoiding IPC.  In reality,  
>> for
>> every reason you may have to put multiple databases in a single JVM,
>> there are better reasons to not do it.
>
> While I agree that a single server will perform better, in practise
> you run into things like two libraries using the same 3rd library and
> you ending up with conflicts. I'm just running into this now where
> we're using JOTM and so is Mulgara (and I'm running Mulgara embedded,
> i.e. in the same JVM and classloader), and bam, things are totally
> messed up because somewhere they use a static field to hold the
> "current" transaction for each thread. This is enormously frustrating
> as a developer. And I've run into this sort of thing many times
> before (both as a user and writer of libs). As soon as you allow
> mulgara to be used embedded, i.e. in the same JVM and classloader, it
> _must_ be able to run multiple instances IMNSHO.

OK, let me put it another way then.  Mulgara was written to be a  
server process, like MySQL.  It was never intended to be embedded.   
Then some years later someone (and I think I recall who it was)  
created an "Embedded" version.  I suppose it was nice that he did it,  
but it was never designed to be used that way, and I know that whoever  
did it didn't think through the consequences of using it this way.

> Also think of testing. E.g. if you want to test a distributed mulgara
> then it's perfectly reasonable to fire up multiple embedded instances
> - much easier than having to start external server processes (and btw,
> I believe the current mulgara tests would work better if they fired up
> an embedded instance rather than a separate server).

Again, it was always a separate server.  The embedded thing came much  
later.  As for a distributed Mulgara, the whole point of this is to  
have transport between the servers.  Doing things in the same JVM is a  
good way to miss a number of bugs that only crop up when serializing/ 
deserializing.

>> In terms of cleanliness of design, then you may be right.  However,
>> things like the NodePool will definitely perform better if they are
>> singletons.
>
> Forget performance here. If you're running multiple instances it's
> obvious that you're wasting memory etc by running multiple instances,
> so I wouldn't worry about it. When you need performance then you'll
> obviously make sure you're only running one instance per machine; I
> see the use cases for embedded instances in testing, in small (quite
> possibly completely hidden) databases inside libs and apps, and in
> quick-install scenarios for apps.

I can see the desire for what you're talking about.  People ask about  
this configuration all the time.  The problem is that Mulgara was  
never built for it.  I believe it would be fine to distribute this  
way, but it would take an engineering effort to go through and clean  
it up for this purpose.  No one ever did that - and whoever created  
this option SHOULD have done it.  If you have the time (I know I  
don't), then I'd love for you to go through and clean it up for this  
purpose!  :-)

>>> We've also tried creating in-memory embedded mulgara instances,  
>>> but a
>>> bunch of things are not implemented there so we've found this not to
>>> be usable (e.g. searching for typed-literals is not implemented in  
>>> the
>>> memory-based string-pool, leading to all sorts of things bailing  
>>> out).
>>
>> Fair enough.  That should be trivial to implement though.  Do you  
>> want
>> it?
>
> :-) For in my copious spare time? I think I have a couple other things
> with higher priority right now. I'll create a ticket for it however.

I'd rather you spend your copious spare time cleaning up Mulgara for  
Embedded usage.  :-)

Actually, things should get a little better when the whole monolithic  
server thing gets refactored.  At that point we *should* have a single  
class that represents a database, with all the configuration, etc,  
taken care of.  This would be a good starting point to make sure that  
it is possible to create multiple instances of this class (providing  
different resources are supplied to it, like directory paths).

Paul



More information about the Mulgara-general mailing list