[Mulgara-dev] Query cancellation

Alex Hall alexhall at revelytix.com
Mon Jan 18 20:21:24 UTC 2010


I brought up this issue a few months back, and it's time to revisit it.
This is going to be a long email, so please bear with me...

We've recently added a graphical query builder to our application, which
allows users to write and execute arbitrary SPARQL queries against an
RDF graph stored in Mulgara. What we've found is that general users are
quite adept at writing queries that, for a variety of reasons (such as
unintended cross-products), will bring the system to its knees. We need
a way of canceling such misbehaved queries and releasing system
resources on the server in order to mitigate this situation.

Since Mulgara doesn't provide this capability at the moment, it looks
like I'll have to add it myself. I'm operating in the context of
client/server communications over RMI using the Connection API, so I'll
concentrate on that for the time being; hopefully, whatever I do can be
easily extended to apply for the TQL REST interface as well. For the
purposes of this email, I've identified three broad areas of concern
which I'll address in order:

1. At the transport level, communicating to the server that the client
wishes to terminate an operation.
2. On the client side, advertising a method to terminate the operation.
3. On the server side, detecting that an operation has been requested to
terminate and gracefully halting the processing.

For the first issue -- communicating a request to terminate from the
client to the server -- Java RMI does such a good job of hiding the fact
that an operation is even executing remotely that this is difficult to
do out of the box. You can even kill the client process entirely, and
the server will continue processing unaware. I've seen suggestions to
allocate request ID's and using a control channel to terminate requests,
but this seems a bit fragile.

A more likely approach is illustrated in a small open-source utility
library that I found called Interruptible RMI
(https://interruptiblermi.dev.java.net/). This library uses custom
thread and socket factories, and overrides Thread.interrupt() to close
the underlying socket if the thread is blocked in an RMI call. It's a
somewhat blunt approach, but also simple and effective.

The Interruptible RMI library is available under the Apache 2.0 license,
which I believe is compatible with Mulgara. One question I have is, are
there any housekeeping requirements involved when adding third-party
libraries to Mulgara, or can I just drop the JAR into /lib and start
using it?

One caveat regarding this approach is that the thread executing the
client call must have been created using the custom thread factory. For
that reason, I'm envisioning adding some sort of client proxy Session
implementation that would invoke operations in separate threads created
using the custom factory in order to hide that detail from client code.
Of course, clients could use the thread factory directly if they so choose.

That brings me to my next issue. The ability to cancel an operation is
an important feature to have when it comes to building
production-quality systems. If we're going to add this feature, then I
think it would be a good idea to advertise its availability and make it
more accessible. In the case of the Interruptible RMI library,
overriding Thread.interrupt() is a low-level implementation detail and
not really suitable as a general-purpose interface.

Since I'm working with the Connection API, I would suggest extending the
Command interface with a cancel() method. The general contract would be,
if Command.execute() is running in one thread and Command.cancel() is
called in another thread, then execute() will return immediately (with
some sort of exception) and any server resources will be released as
soon as is reasonably possible. Using the Interruptible RMI library, it
would be fairly straightforward to make an abstract command class that
will register the thread that invokes execute() and then interrupt that
thread in cancel().

Finally, there is the issue of how to terminate the server processing to
release resources. The general problem here is, there is no way to
gracefully interrupt processing if the server isn't expecting to be
interrupted to begin with. This suggests that the server process (in
this case, the query execution thread) needs to periodically check
whether the request was canceled and respond accordingly. The "how" part
of detecting a cancellation is simple enough, at least with the
Interruptible RMI library; there is a utility method available for a
server to check whether its underlying socket is still alive.

As far as when to do this detection, that's going to be a bit of a
balancing act. For maximum responsiveness, you would need to perform
this check very frequently which would mean lots of code modifications.
Ideally, there will be some central control point where you can do the
check on a regular basis with minimal code changes. For the Mulgara
query engine, the most likely candidate seems to be the
ConstraintResolutionHandler and GraphResolutionHandler dispatch tables
in the ConstraintOperations class. I think a good first step would be
adding a check in here to see if the request has been canceled, and if
so throw a QueryException which will quickly unwind the call stack,
close any open resources, and roll back the transaction.

This is just a first cut, and there will of course be other issues that
come up. But I wanted to get this out there and ask for feedback before
I get too far down this rabbit hole.

Regards,
Alex



More information about the Mulgara-dev mailing list