[Mulgara-dev] Database and SessionFactory issues

Paul Gearon gearon at ieee.org
Tue May 20 18:18:18 UTC 2008


On Tue, May 20, 2008 at 7:35 AM, Alex Hall <alexhall at revelytix.com> wrote:
> In recent posts to this list, Ronald has expressed his unhappiness with
> the SessionFactory architecture, and I share this sentiment.

I haven't had to work with this for a while, so my memory is a little
vague on the operation of this class (though Alex has helped remind me
of a lot of it). However, on those occasions that I've had to work
with this class, I clearly recall that pain was involved.

>  In an
> offline conversation, Paul asked me for a list of specific issues that I
> have.  In the interest of giving him something more to work with than
> "It's broken and needs refactoring," here is my list of specific issues
> related to SessionFactory that bothers me.  It is by no means meant to
> be exhaustive, it is just a collection taken from my personal
> experience.  Please comment on or add to it as you see fit.
>
> 1. Canonicalization of server URI's.  There are too many places
> throughout the code that are trying to do hostname aliasing, with the
> end result being that in any one place there's no good way of knowing
> what the expected form of the server URI will be.  This results in
> issues such as the one cited by Ronald, where a server URI obtained from
> one place cannot be used with SessionFactoryFinder to connect to the
> database.  This issue is closely related to the conflation of URI and
> URL; once the database URI is decoupled from its location, there is no
> need to convert it to canonical form.

This is starting to happen now, and shouldn't be a problem soon. URLs
will be used to find the host/server, but once it's there we will be
stripping off the whole rmi: section of the URL, and creating a
completely different type of URI.

> 2. SessionFactory classes take URI's instead of URL's.  Once again,
> resolving this depends on decoupling the logical database URI from its
> physical location.  I think that the argument to a SessionFactory should
> be a URL that describes the location of a server, not the URI that
> describes the identity of the server.  This in turn might dictate the
> need for a mechanism of converting from database URI to server URL to
> support certain applications.

At the moment the location of the server *is* the identity of the
server. Canonicalization, and lack thereof (mentioned below) confuses
this issue, since different URLs are supposed to refer to the same
server, and it's often not detected.

All the same, I agree that sessions need a URL, since they are about
finding a server and connecting to it.

> 3. SessionFactoryFinder falls back on LocalSessionFactory if an RMI
> connection on localhost fails.  This has the unpleasant side effect of
> attempting to start up an entire new Database.  The way I see it, if I
> want to connect to a server using a URI with a scheme of "rmi://" that
> means I know there is a remote server running at that location.  It
> *doesn't* mean that I want an embedded database to be instantiated in
> the local JVM if no server is running at that location and the location
> happens to be on my local machine.  I expect to get an error if this is
> the case.

This actually raises a couple of issues.

First, if you don't have a local database to connect to, then I agree
that when to try to connect to one you should fail. If you want a
database, then you should create a database.

Second, if you have asked for the rmi scheme but it's for a database
in the local JVM, should you be given a LocalSession instead of an
RmiSession? The whole structure of Session and RemoteSession is about
hiding the differences between the two types, and it can be argued
that we should try to make this interception for the sake of
efficiency. OTOH, there's also a legitimate argument that if you
specify the rmi protocol then that's what you should get (even if it
doesn't make a lot of sense when you're in the same JVM). I'm in two
minds on this issue.

> 4. LocalSessionFactory *always* attempts to instantiate a new Database.
>  It has a static SessionFactory field that it tries to initialize with
> a new triple-store implementation the first time that newSession is
> called.  It does not have knowledge of any other Database instances
> which might be present in the JVM (either through EmbeddedMulgaraServer
> or direct use of the Database class) so it always tries to create a new
> one, which in turn raises errors if one already exists.  I think the
> semantics of the newSession operation should be to connect to an
> existing local Database with the given URI.  If the client wishes to use
> a local Database, they should configure it directly instead of relying
> on its creation as a side effect of a call to the newSession method.

Agreed.

At the moment we can't have multiple Database instances (too many
static fields in use), though ultimately we'd like to allow this.
That's still some way off, since I think we have to make sure that
each database gets its own URI, and the URIs we're considering only
mention server names.

Even so, the fact that we can't have more than one at the moment,
means that trying to create a new one is crazy. So that's not simply
poor design... it's a bug.

> 5. The triple store implementation class is configurable.  In theory,
> this might be a nice feature to have, but in reality are we really going
> to have an alternate implementation class to Database?  Making the class
> name configurable and using reflection to invoke it when it can only
> have one value only makes for obtuse code, and possibly interferes with
> our ability to effectively configure the Database.

I agree. We have different stores, but it all happens through
"Database". In fact, the whole application is a really an instance of
"Database". EmbeddedMulgaraServer makes it appear that "Database" is
not the core of the application, but that's because
EmbeddedMulgaraServer is a bad design.

> 6. EmbeddedMulgaraServer needs to go away.  It is unreadable and it
> makes my head hurt trying to decipher anything that is going on in
> there.  There are at least 2 or 3 objects called "server" and it's hard
> to tell what any of them is doing.  There is no way that class should be
> starting up a Jetty server, but that's what it does.  As Paul has said,
> it needs to be replaced altogether with a set of modularized services
> and components that the developer can configure as he sees fit.  It is
> relatively isolated from the other issues here, but related enough that
> I think it should be done in concert with any serious refactoring of the
> SessionFactory framework.

I cannot say this often, nor loud enough. This class is a dog's
breakfast. Alex already didn't like it, but I think I was able to show
him it was even more broken than even he believed.

EmbeddedMulgaraServer refers to a couple of different server types
(Server, ServerMBean, etc) but it gets hard to find out where those
servers are created, or what the implementing types are. Well it turns
out that EmbeddedMulgaraServer itself implements these interfaces, and
it keeps referring to itself using all of these different interfaces.
It is almost impossible to read.

It *is* possible to manually read the code and work out how it starts
Jetty (I've done it), but Alex is right that it's hard to do.
Personally, Jetty has no place inside a Mulgara server.

I'd like to make Database much easier to create. Just say, "here is my
config file, now give me a database". It can be done, but it's very
convoluted. You have to reverse-engineer how it is being built in
EmbeddedMulgaraServer if you want to do it for yourself.

I'd also like to re-create the existing EmbeddedMulgaraServer
functionality, using a more modular approach. So the resulting
application would load up an XML file that describes the modules to
load, and these can include an RMI service, an HTTP server (such as
Jetty), along with services that load into other modules. So the Jetty
module can install sub-modules that can include documentation pages, a
servlet for the web-page interface, a servlet for SOAP, and another
servlet for SPARQL.

Even if I don't gain a lot with the new Server application, replacing
EmbeddedMulgaraServer will give the system a lot more transparency,
and help decouple classes like Database.

Paul



More information about the Mulgara-dev mailing list