[Mulgara-general] My job, and SPARQL development

Wed Feb 6 02:14:44 UTC 2008

I guess there isn't a real need to respond to this, but I'll have a go  
anyway...

On Feb 5, 2008, at 6:46 PM, Life is hard, and then you die wrote:

> On Tue, Feb 05, 2008 at 12:58:39PM -0500, Brian Sletten wrote:
> [snip]
>> One other development that Paul and I have discussed off-list is
>> creating a more embeddable instance that will be easier to embed. We
>> are hoping to get rid of a lot of the RMI cruft and streamline how  
>> the
>> Mulgara engine can be used, accessed, etc.
>
> We've been running mulgara as an embedded instance for a couple months
> now, so I thought I'd mention how well/badly it works for us.
>
> The RMI is not a problem, since we just create a local
> DatabaseSession. What is ugly is the lack of model-name uri/url
> disambiguation (ticket #58), though it's not a big problem for us
> _yet_ as we aren't yet running the same code on embedded as well as
> on remote instances,

You'd have seen how the URI was used to identify the server, so this  
used to be necessary.  Now that I've cleaned up the TqlInterpreter  
code so that the parser does not try to create a connection, then the  
need for the graph URI to locate the server has gone.  However, the  
previous need is still an artifact in the system.  As you know, we  
want to change that soon.

Also, the original design never allowed for a local DatabaseSession.   
Sure, a lot of people want it, but the interface to allow it was  
hacked in.  This is going to create problems until someone takes the  
time to get it right.

Actually, the entire monolithic system that gets created by  
EmbeddedMulgaraServer has to be fixed up.  Everything should be pulled  
out into sets of independent modules.  Then services like RMI or web  
interfaces, etc, can be build over the top of these modules, and  
provided as modules themselves in a larger framework, such as Apache.   
Indeed, I'd like to create an "equivalent" to the current  
EmbeddedMulgaraServer that just reads a configuration file, and loads  
the appropriate modules. These modules will include a Jetty server,  
along with those servlets to be loaded up into it.  That's all this  
program would do.  Of course, one of the modules will be a database,  
which the other modules could then share (if desired).  The result  
would be a small program that really just knows how to read a config  
file, and load up modules.... nothing else.  Each of the "services"  
can be built reasonably trivially themselves.  With a modular design  
like this, it would be easy to use Mulgara in an embedded way, or to  
trivially wrap it in some service code to create an RMI service, an  
HTTP service, a SOAP service, a SPARQL protocol service, etc.

We should discuss this more soon, but for the moment there are some  
other priorities.  OTOH, I want to get to this before we implement the  
SPARQL protocol (as opposed to SPARQL querying, which is already  
underway).

> However, there are decidedly ugly things in the whole SessionFactory
> stuff, including SessionFactoryFinder and LocalSessionFactory (in fact
> I think that part needs some serious rework), so that I've ended up
> just copying out parts of code from those to create the
> DatabaseSession instance. So this part could be significantly
> improved.

At one point a few different interfaces were allowed, with more being  
planned.  But then BEEP got dropped, and other things were put in, and  
the thing incrementally grew.  I agree that it needs to be cleaned up.

> Another problem is the assumption in various places of only a single
> mulgara instance. If you think of libraries embedding mulgara for
> internal use then it would be really nice if multiple instances could
> be running in the same JVM and classloader (separate classloaders can
> already be made to work). I've found two main places that need fixing
> here: the above mentioned session-factory stuff, but I've also found
> the use of static fields (singletons) in things like the
> XANodePoolFactory which end up preventing one from being able to run
> multiple instances.

This is because you aren't supposed to run more than one database in a  
single system.  It is guaranteed to be slower.  This is one of the  
reasons we allowed for multiple graphs.  If you really need it, then  
you can run multiple servers on the one system.

The only advantage I can think of for having multiple databases in one  
JVM is to permit multiple writers while avoiding IPC.  In reality, for  
every reason you may have to put multiple databases in a single JVM,  
there are better reasons to not do it.

In terms of cleanliness of design, then you may be right.  However,  
things like the NodePool will definitely perform better if they are  
singletons.

All that said, this will possibly fall out naturally as we modularize  
the components in Mulgara.

> We've also tried creating in-memory embedded mulgara instances, but a
> bunch of things are not implemented there so we've found this not to
> be usable (e.g. searching for typed-literals is not implemented in the
> memory-based string-pool, leading to all sorts of things bailing out).

Fair enough.  That should be trivial to implement though.  Do you want  
it?

> Anyway, I just thought I'd note some things that we noticed along the
> way.

That's OK.  I didn't write a lot of it.  Sometimes I find myself  
debugging or maintaining code that I really want to refactor, but it  
can be hard to justify the effort sometimes, especially with so many  
other things to get done.

Paul