[Mulgara-dev] ModelName URN/URL fix (MGR-58)
Andrae Muys
andrae at netymon.com
Thu Jun 28 08:13:39 UTC 2007
Well I spent a fair amount of last week preparing to implement this.
The design is relatively straight forward, although some of the
constraints it must satisfy are subtle.
The primary problem this is intended to address is the current
conflation of a model's location with its name. Specifically we
currently use the same URI to refer to a model in a query that we use
to refer to a model in the rdf statements describing the model. To
make this more concrete consider a model rmi://localhost/server#test.
When the model was created it as assigned a model type. This type
identifies which resolver should be used to answer queries referring
to the model, and the type is stored as a statement in the system-
model. In the case of a normal model the statement stored is:
rmi://localhost/server#test rdf:type mulgara:Model
On the other hand if rmi://localhost/server#test should be a view the
statement stored is:
rmi://localhost/server#test rdf:type mulgara:ViewModel
Each resolver factory is responsible for ensuring it registers itself
as handling the appropriate model-types, and with each query mulgara
queries the system model to identify the type.
The problem is that from the perspective of a client there are large
number of sensible 'names' that could be used as aliases for a given
model in a query - in the case of 'test' a few might be:
rmi://127.0.0.1/server#test
soap://localhost/server#test
local:server#test
rmi://my.domain.name/server#test
and given what defines a 'legitimate' 'name' for a model in a query
is client defined, the server cannot know every possible option - .
We would like
select $s $p $o from rmi://localhost/server#test where $s $p $o
and
select $s $p $o from soap://127.0.0.1/server#test where $s $p $o
to be equivalent in every respect, except possibly the protocol used
to access the server.
The problem is that because the mapping from model to model-type is
by necessity in terms of a specific model-name, and consequently a
naive attempt to resolve a query against an alias (soap:....) will
fail to find the model.
The natural solution to this problem is to specify that the name used
in the system-model be some 'canonical name', and that all references
to models received in queries and other operations be first mapped
into a canonical namespace before use.
Our initial attempt at this involved trying to identify a suitable c-
name from the dns system, and to use that. Maintaining a list of
known and configured aliases that could be used to map incoming model-
names to c-names. This works to a point - and that point is when
people try to migrate databases to new systems, or run mulgara on
mobile platforms that routinely migrate between different networks
(ie. notebooks). At this point the reliance on the dns system bites us.
A key realisation required before we can solve this problem was that
the names being used by clients and the names being used by mulgara
internally are actually distinct. The URI's used internally really
are *names*; while the URI's used by clients are actually
*locations*. It is trivial to maintain a one to one relationship
between internal names and models, however the only guarantee we can
provide with locations is that once dereferenced, at a specific point
in time, from a specific client, a location will only ever refer to a
single model.
This distinction becomes important when the usecases become more
involved - specifically when people start using meta-models.
Consider a FOAF aggregation application where each FOAF file is
loaded into its own model, and a separate catalogue of FOAF files is
maintained which tracks such information as when/where each FOAF file
was obtained. So we have:
rmi://localhost/foafdb#foaf-file-1
... contents ...
rmi://localhost/foafdb#foaf-file-2
... contents ...
rmi://localhost/foafdb#foaf-catalogue
rmi://localhost/foafdb#foaf-file-1 foafdb:downloadedfrom http://
my.domain.com/myfoaf.foaf
rmi://localhost/foafdb#foaf-file-1 foafdb:downloadedat
"20070205"^^<xsd:Date>
rmi://localhost/foafdb#foaf-file-2 foafdb:downloadedfrom http://
your.domain.com/yourfoaf.foaf
rmi://localhost/foafdb#foaf-file-2 foafdb:downloadedat
"20070207"^^<xsd:Date>
and now we want to consider the following pseudo-code to print all
FOAF files downloaded on a Wednesday:
answer = foafdb.query("
select $foaf $date
from <rmi://localhost/foafdb#foaf-catalogue>
where $foaf <foafdb:downloadedat> $date")
for (foafFile, date) in answer:
if isWednesday(date):
foafContents = foafdb.query("
select $s $p $o
from " + foafFile + "
where $s $p $o")
printFoaf(foafContents)
Now the URI's inserted into the catalogue need to be names because
they need to unambiguously identify a unique model. However when the
client wishes to use the name in a query what it actually needs is a
location. Now provided the client has access to RMI, and uses the
same dns mapping between name and address the server used, this will
work. Unfortunately neither of these can be guaranteed, and when it
does fail there is no workaround.
The goal of MGR-58 is therefore to abandon any pretense that a
model's name can be used as a location, and to make this distinction
explicit.
We do this by introducing a new URI scheme that will identify model
names, rdfdb. So the test model above will become: rdfdb://some-
unique-id#test. A ModelURLResolver would then be able to map this
URI into a suitable URL for referring to the model externally. So
the catalogue above becomes:
rdfdb://unique-id#foaf-catalogue
rdfdb://unique-id#foaf-file-1 foafdb:downloadedfrom http://
my.domain.com/myfoaf.foaf
rdfdb://unique-id#foaf-file-1 foafdb:downloadedat "20070205"^^<xsd:Date>
rdfdb://unique-id#foaf-file-2 foafdb:downloadedfrom http://
your.domain.com/yourfoaf.foaf
rdfdb://unique-id#foaf-file-2 foafdb:downloadedat "20070207"^^<xsd:Date>
and the first query becomes:
answer = foafdb.query("
select $foafurl $date
from <rmi://localhost/foafdb#foaf-catalogue>
where $foafuri <foafdb:downloadedat> $date
and $foafuri <mulgara:hasCanonicalRMIURL> $foafurl in <rmi://
localhost/foafdb#modelURLResolver>")
or if the application is using soap:
answer = foafdb.query("
select $foafurl $date
from <soap://localhost/foafdb#foaf-catalogue>
where $foafuri <foafdb:downloadedat> $date
and $foafuri <mulgara:hasCanonicalSOAPURL> $foafurl in <soap://
localhost/foafdb#modelURLResolver>")
There is also the suggestion that the ModelURLResolver could also
support deconstructing (and therefore in a prolog-like manner,
constructing) URL's into their components. ie.
select $foafurl $date
from <soap://localhost/foafdb#foaf-catalogue>
where $foafuri <foafdb:downloadedat> $date
and { $foafurl <mulgara:refersTo> $foafuri
: <mulgara:scheme> "soap:"
: <mulgara:host> "localhost" in <soap://localhost/
foafdb#modelURLResolver> }
Although support for this sort of thing is not intended for the
initial release as I would prefer to avoid the work required to
implement a query transformation based resolver delaying the release
of this fix.
The work required to implement this falls into two areas.
1. Bootstrapping the SystemModel and ServerGUID.
2. Catching all references to model-url's and mapping them to uri's
before use.
Bootstrap is relatively self contained - the system's bootstrap code
is mostly contained in BootstrapOperation.java. We will need to
check for an existing ServerGUID, and if one is found store it in the
DatabaseMetadataImpl class. If it isn't found we create a new one
and store it in a local distinguished model - this would be similar
to the way preallocated nodes are currently handled. NB. we could
just use this as the system model directly, but because the URI would
have to be the same globally there will be no way to query this model
externally - it will be accessible *only* by mulgara internally - and
there are too many usecases that require external read access to the
system model for that to be feasible.
References are made to models in 6 operations that I am aware of
currently (I may have missed one or two in the KRule stuff). In 5
the mapping is trivial as the model is passed directly as a parameter:
Create
Modify
Remove
Set
Backup
The 6th of course is Query, and there we have options.
1. We can apply a query-transform that examines the from-clause and
every in-clause and performs the mapping
2. We can catch every localization of an in/from clause as it is
integrated into the constraint in LocalQueryResolver::resolve
3. We can catch it just before we lookup the ResolverFactory in
DatabaseOperationContext::getCanonicalModel
The current dns based partial-solution lives in 3.
(DOC::getCanonicalModel) - so the quickest approach is to just modify
that. The cleanest approach is either 1. or 2. Ultimately I believe
we are going to have to go with 1., however we currently don't have
support for rewriting the from-clause in a query-transformation so
this would require updates to the symbolic-transformation code. We
probably need to change the implementation of
SymbolicTransformationContext, but for now if we stick with either 2.
or 3. we can avoid changing the Resolver-SPI, which would be nice.
Andrae
--
Andrae Muys
andrae at netymon.com
Mulgara Consultant
Netymon Pty Ltd
More information about the Mulgara-dev
mailing list