[Mulgara-dev] Architecture questions: connections, sessions, servers

Paul Gearon gearon at ieee.org
Fri Dec 10 03:50:11 UTC 2010


On Thu, Dec 9, 2010 at 4:58 PM, Gregg Reynolds <dev at mobileink.com> wrote:
> Hi,
> I've gone through a bunch of the source code related to
> EmbeddedMulgaraServer but I'm still a little unclear on some issues.  The
> basic process seems to be:  EmbeddedMulgaraServer launches a mulgara server,
> then configures Jetty with some servlets and launches it (actually two of
> it).  When a request for e.g. the sparql servlet arrives, the query is
> extracted, compiled, and executed against the server.
> Where I'm not clear is the relation between the "server" and the sparql
> processing servlet that queries the server.  I gather the servlet must talk
> to the server via an RMI connection, correct?

That's a possible configuration, but only if you're running in an
application framework like Tomcat. That's not the case when you're
using EmbeddedMulgaraServlet.

EmbeddedMulgaraServer starts an RMI service, but that's only so other
processes can connect to the database through RMI. It is not relevant
to the HTTP services at all.

The short story is that the EmbeddedMulgaraServer constructor creates
a database (with "createServer") and then starts up the HTTP server,
passing the database along as a parameter. Unfortunately, this isn't
obvious, because config files and reflection are in heavy use.

If you look at the call to createServer, you'll see that the second
last parameter is tripleStoreClassName. This comes from the
mulgara-config.xml file (the default file is
conf/core/mulgara-x-config.xml, which is built with an Ant task). This
parameter gets set to org.mulgara.resolver.Database. This object will
get created via reflection, and is the SessionFactory for all
connections to the database. This object is provided to both the RMI
and HTTP servers.

The HTTP server is started via the class HttpServices. The
HttpServices class is given a reference to the EmbeddedMulgaraServer
since it is through that object that the HttpServices will get hold of
the Database. They can't be given the database directly, due to the
order in which certain objects have to be created, and the database
doesn't exist yet.

> But it looks to me like the
> sparql servlet is instantiating a server, via ServerMulgaraServlet.

That's actually only when the system is loaded via a WAR file.

If the system is running in standalone mode (ie. launched from
EmbeddedMulgaraServer) then a Database is created as described above,
and is made directly available to each of the servlets via their
construction parameters.

If the system is running in Tomcat, then there are two options. Either
we want Tomcat to start a Database somewhere such that the servlets
can find it, or else we will have our own database running somewhere
else, and the servlets will connect to it via RMI. In the former case
you want Tomcat to be able to start a Database. That is done with the
ServletMulgaraServer class. If you look in the WAR file for web.xml
you'll see this servlet is configured to start up first. This servlet
does not respond to any requests at all. It just starts the database.
Once it's running, the other servlets can find it through reflection,
and ask for the Database that it created for them.

> I'm
> afraid I don't know much about RMI, but I assume the sparql servlet must be
> using some kind of proxy or stub to talk to the server over rmi; is that
> close?

It *can* do that, but it's not a common config. If you start delving
into the RMI classes you'll discover that all of them have wrapper
classes that hide all elements of RMI. This is done so that local
objects and RMI objects look identical. (It leads to a truly horrible
naming scheme on some classes, which I didn't create, but now that I'm
used to it I can't think of anything better).  :-)

> More generally, can anybody clarify what is meant by such terms as
> "session", "connection", "factory", etc. in the source?  For example,
> sometimes a variable of type SessionFactory is named sessionFactory, but the
> comments indicate it is supposed to be a database.  Elsewhere an MBean is
> called a database.  Sometimes "session" seems to mean HTTP session in the
> servlet, other times it seems to mean a session with the database..  It's
> all rather confusing, I'm afraid.

You and me both. When this stuff was being built I was working on
transactional disk storage and query optimizations.

OK, here goes....

A "Factory" is a class that creates something else. Most of the system
is configurable (see the XML files), so there are lots of places where
code will be getting something that meets an interface, but not
necessarily know what that thing is.

For instance, the resolvers are all things that meet the Resolver
interface, but the calling code doesn't know if it's talking to an XA1
resolver, a Relational resolver, or an HTTP resolver. The idea of a
factory is a class that will create the instances of the things we
want to work with. So to get a Resolver you'll use a ResolverFactory.
In this case, you'll look at the XML config and ask "What are my
resolver factories?" and you'll get back strings with the class names
for these factories. You then create instances of the factories. Then
when you need a resolver you just ask the factory for a newResolver().

Ignoring HTTP sessions for a moment...

"Sessions" refer to a connection to a database. It's called a session,
since it contains some state, particularly the transaction state. Once
you have a session on a database you can tell it things like: you want
to start a transaction, write some data, read something, write some
more, and then commit the transaction. Some sessions provide
read/write access. Others are read-only.

If you interact with the database in any way, then you've used a
Session. Sometimes that's hidden from you, but it's still there. For
instance, Connections to a database actually create (or re-use) a
Session, and hide a lot of the details.

Users should never use Sessions directly. "Connections" were created
to hide Sessions, and to make Session management less onerous.
However, at this level of the code you'll see Sessions instead of
Connections.

HTTP Sessions are part of the servlet interface, and store state
between HTTP operations. Clients send references to those sessions by
providing cookies with their HTTP requests. The name conflicts with
database sessions, but database sessions had been around for about 8
years before HTTP sessions were used in Mulgara. I wasn't about to
rename half of the system just because I introduced a new feature. :-)

Bringing concepts of factories and sessions together: Anything that
gives you a Session is a SessionFactory. Therefore a Database is a
SessionFactory.

If you look at EmbeddedMulgaraServer.createServer, you'll see that it
creates a ServerMBean (more on that later). This is the thing that
will create the database. On line 687 you can see it set the
"providerClassName". That's where it's telling the mbean to use
"org.mulgara.resolver.Database". Then in
EmbeddedMulgaraServer.startServices you can see it call
serverManagement.init() (line 413). Finally, in AbstractServer on
lines 266-280 you can see it create an instance of whatever
"providerClassName" is set to (ie. an instance of a Database). The
result is stored as "sessionFactory". At that point anyone can ask the
session factory for a session.

As for MBeans.... these are a part of JMX (Java Management
Extensions). JMX lets you query objects to know what they are doing,
and it can also allow you to tell a system to perform actions. For
instance, a web server might use JMX to indicate how many request a
second it is serving. A JMX client might also be able to tell that
server to flush its cache.

The idea here was that Mulgara was to be instrumented with JMX. I
can't recall if that happened (it probably did, but I never used it).
Since the server was to be instrumented, it was set up to be an MBean
(MBean = Management Bean). That's why the ServerMBean interface is
full of getter and setter methods (that's the instrumentation) and
several actions (methods that don't take arguments, such as start()
and stop()).

Whenever you see an MBean mentioned around the server, then just
remember that this bean *is* the server. You're just looking at the
methods that are most relevant to JMX. In fact, in
EmbeddedMulgaraServer on line 417 you can see that the server needed
to get the session factory from the MBean, but that the required
method isn't on the MBean interface. So the code on that line casts
the mbean to an AbstractServer and *then* calls getSessionFactory().
In other words, the MBean is a server (AbstractServer in this case).

> FYI I'm trying to use the conceptual categories and architectural structures
> outlined in Architecture of a Database System, by Hellerstein, Stonebraker,
> and Hamilton.  (PDF copy available online, search for the title), which is a
> terrifically clear overview of terms and concepts many of which should in
> principle apply to any RDF database system.

Sorry. You're on your own there. :-)

Regards,
Paul


More information about the Mulgara-dev mailing list