[Mulgara-dev] Query cancellation

Tue Jan 19 20:46:11 UTC 2010

On 1/18/2010 9:13 PM, Paul Gearon wrote:
>>>> One caveat regarding this approach is that the thread executing the
>>>> client call must have been created using the custom thread factory. For
>>>> that reason, I'm envisioning adding some sort of client proxy Session
>>>> implementation that would invoke operations in separate threads created
>>>> using the custom factory in order to hide that detail from client code.
>>>> Of course, clients could use the thread factory directly if they so choose.
>>>
>>> You should be able to hide most of that in the Connection factory,
>>> shouldn't you? The idea of this factory was to make establishing RMI
>>> connections easier.
>>
>> Sure, the ConnectionFactory would be one likely location to hide that
>> sort of thing. No matter where it gets implemented, I imagine that there
>> will need to be some sort of configuration option that allows the client
>> to decide whether to have the overhead of executing operations in
>> interruptible delegate threads. Unfortunately the only way I know of to
>> configure anything in the client libraries is going to be with Java
>> system properties.
> 
> Sure, but that can be set in code, and config files can read that
> setup. I believe we already have that kind of thing in places.

I don't quite follow you here -- can you elaborate?

>>>> Since I'm working with the Connection API, I would suggest extending the
>>>> Command interface with a cancel() method. The general contract would be,
>>>> if Command.execute() is running in one thread and Command.cancel() is
>>>> called in another thread, then execute() will return immediately (with
>>>> some sort of exception) and any server resources will be released as
>>>> soon as is reasonably possible. Using the Interruptible RMI library, it
>>>> would be fairly straightforward to make an abstract command class that
>>>> will register the thread that invokes execute() and then interrupt that
>>>> thread in cancel().
>>>
>>> I believe that a Connection only allows you to run a single Command at
>>> a time on it, right? (you can certainly only run a single write
>>> command, but I never gave much thought to multiple queries). If so,
>>> then I'd prefer to see cancel() on the Connection (commands get run on
>>> a connection anyway, so Command.cancel() can easily just pass this
>>> onto a Connection that they're currently running on).
>>
>> I don't think this is the case. The Connection factory is designed so
>> that multiple threads attempting to open connections to the same server
>> would get different connections, but the motivation for this was to
>> avoid having one worker close a connection still in use by another
>> worker.
> 
> I was actually thinking of the case where a client opened a connection
> and then used it in multiple threads. If they open a connection
> multiple times, then they'll get multiple connections, as you say, in
> which case we don't have to worry.
> 
>> Once you have a Connection, though, there is certainly nothing
>> in the Connection class to prevent it from being used to invoke multiple
>> commands concurrently in multiple threads.
> 
> That was my concern. We don't protect against it, but then again, most
> libraries don't. It's unnecessary overhead to prevent a usage pattern
> that is easily documented with the words "not reentrant" (I believe
> those words are written *somewhere*, but few libraries are designed to
> be reentrant, and usually document when they are).

Good point -- the Javadoc for the Connection interface explicitly states
that the connection is not safe for concurrent access. I even wrote the
paragraph in question, so I should know better! :-)

I suspect that the connection is reentrant with respect to read-only
transactions, but the fact remains that it wasn't designed or tested
that way, so we can't claim to support that usage pattern. And it
certainly isn't with respect to writes; I have no idea what the expected
behavior would be if you called commit() on a connection while a write
transaction is in progress.

>> Behind the scenes, what happens when you get a new Connection is that
>> the server will create a new DatabaseSession, wrap it in a
>> RemoteSession, export the session, and return a stub to the client. The
>> Connection is just a thin wrapper around the remote session stub;
>> concurrent commands on the Connection will result in the remote
>> DatabaseSession being accessed concurrently by multiple RMI threads.
> 
> Really? I thought it was supposed to create a new DatabaseSession each
> time? If that's not what's happening then I'd certainly like to change
> it (it's just too error prone to share the one session with concurrent
> connections).

There is one DatabaseSession per Connection. Concurrent commands on a
single Connection will concurrently access the same DatabaseSession
(this is unsupported but we don't prevent it). Concurrent Connections
will not access a shared DatabaseSession, however.

>> If anything is going to prevent concurrent operations from running at once,
>> it is going to be the transaction factory because the DatabaseSession at
>> first glance appears to support concurrent access.
>>
>> Given the fact that it is (at least theoretically) possible to have
>> concurrent Commands being run on a single Connection, I would prefer to
>> have cancel() implemented at the Command level.
> 
> Well, commands just execute operations on the Connection anyway
> (meaning that they call methods on the Session), so I still think it's
> OK to work on the Connection rather than the command. One concession
> I'm willing to make on this is that perhaps we should create a lock on
> a Connection to prevent multiple commands from being run at the same
> time?

Hmm, this brings up an interesting point. It would be good to add a lock
on a Connection. But Connection.execute(Command) just passes straight
through to Command.execute(Connection). So, you could easily put a lock
inside Connection.execute() but this would be bypassed if you went
straight to the Command.execute() entry point, which is where the actual
work is done anyways. But this isn't really the correct place to
implement a lock because Command.execute() is implemented differently by
each operation and the only thing they have in common is they do stuff
on a Session. So, you would need to wrap up the "do stuff on a Session"
part in a function and pass it back to the Connection to invoke with
lock protection, and all of a sudden I see why people get so excited
about Scala.

>>> The reason I'd like to see Connection.close() instead of
>>> Command.close() is because Connections have a more global scope
>>> (they're cached for a start), which Commands are (currently) invisible
>>> outside of their current context. That makes them harder to pick up
>>> from another thread to call cancel() on them. Conversely, since
>>> Connections are cached and indexed by server, you can ask for the list
>>> of Connections to a given server, and figure out which one you want to
>>> cancel() the current operation on.
> 
> Doh! I slipped here and said "close()" when I should have said
> "cancel()". Please forgive the confusion. Can we rewind and pick up
> your response to this point again, only with the function "cancel()"
> instead of "close()" please?

Sure. My initial preference for Command.cancel() as opposed to
Connection.cancel() was driven by my particular use case -- I'm
interested in killing individual queries, not isolating a problem
connection. I'll already be tracking metadata such as username and
timestamp for these queries, so referencing the Commands outside the
context in which they were created is not going to be an issue. From a
semantic standpoint, I wanted to cancel a query, and Query is a type of
Command, so Command.cancel() was my natural choice.

However, on closer inspection, a Command is just a logical operation
that can be executed against *any* Connection. In practice, it would
make perfect sense (especially in a federated environment) to create one
Query and execute it concurrently on multiple connections. You don't
want to cancel the logical operation; you want to cancel the execution
of said operation in the context of a specific connection.

I think part of my confusion arose from drawing a false comparison
between the Mulgara Command interface and the JDBC Statement interface
which I referenced earlier. But the JDBC Statement implicitly references
the connection that created it, whereas the Command represents an
operation that can be executed against any connection. There is no
interface in the Mulgara API to encapsulate the concept of the execution
of an operation on a specific connection. In the absence of such an
interface, the best place to put the cancel() method is in Connection
since there should only be one command at a time on any particular
connection.

All of which is a roundabout way of saying that you've convinced me to
move cancel() from Command to Connection. :-)

Regards,
Alex