[Mulgara-general] Model-URI/URL Use-cases and Requirements and Proposal
Andrae Muys
andrae at netymon.com
Wed Sep 19 11:36:09 UTC 2007
I would greatly appreciate any comments anyone may have - please also
feel free to solicit comments from outside the mulgara community if
there is interest.
* The Use Cases and Requirements
As discussed in my previous email the three key requirements of a
model-URI proposal are:
1. Protocol/Scheme independence
2. Model/Server mobility
3. URI-standards compliance (ie. no fragment)
Also desirable are
4. Unique-name
5. Namespaced to allow a) potential resolution; b) predicable, human-
readable URI's.
The context of the most complex use-case involves 4 models and 4
machines (and assumes a Distributed or Federated Resolver)
:modelA is on server1 on host1 and needs to reference :modelB
and :modelC
:modelB is on server2 on host2
:modelC is on server3 on host3
:modelD is on server4 on host4 run by an unrelated organisation
The application needs to perform the query:
select $identifer subquery(
select $s $p $o where $s $p $o in $location and $identifer
<mulgara:locatedAt> $location in <mulgara:modelURLResolver>)
from host1:modelA
where [ <:useModel> $identifier ] ;
Which queries each model listed in :modelA after converting their
identifier into a URL via a posited resolution mechanism.
Now host2 fails, and we restore server2 on host3 to run alongside
server3.
We would like to be able to have the query run unmodified.
What this means is that :modelB cannot encode host2 in its URI.
The URI does need to encode some sort of server-id as servers are
guaranteed to use the same model-names at least some of the time
(consider all system-model's have the name "").
Also because :modelD and :modelA-C are managed by unrelated
organisations we must somehow encode the organisation in the model's
URI-stem as they may well decide to use the same server-id ("server1"
or "database" anyone?).
Also consider that any encoding of the organisation must also allow
that organisation to maintain their own independent registry, or the
proposal ceases to be scale-free (it's on this that the original UUID
proposal floundered).
I have considered abandoning requirement 4, and just using URL's.
However ultimately we require a canonical name for internal purposes
(even if it isn't exposed externally), and so even using URL's we
would have to pick a designated 'unique name' for the model - we
can't escape that - so we might as well save ourselves the headache
and make it unambiguous.
So a summary of my thinking on the use-cases/requirements for rdf
model-names - we desire:
1. Unambiguously an identifier
2. Encodes organisation
3. Encodes server-id
4. Doesn't encode hostname
5. Potentially resolvable via a per-organisation registry
* Proposal
If we wish to be unambiguous then we should use our own URI-scheme.
This has the added benefit that once we use our own scheme we have a
lot more flexibility regarding how we structure the rest of the URI
to meet our requirements.
I am proposing to use the scheme 'rdfdb' - as did the original UUID
proposal.
I would prefer to avoid the use of opaque URI's; there is no reason
why our URI can't be introspected if we structure it sanely - so the
structure according to RFC2396 will be 'rdfdb://authority/path'.
Logically the model-name itself makes a good path so we arrive at
'rdfdb://authority/modelName'. Leaving the need to encode an
organisation and a server-id in the authority in a fashion that will
potentially permit resolution via a registry.
Now as the authority is not a hostname, RFC2396 identifies us as a
"Registry-based Naming Authority". As such, the characters we have
permitted to us are [ - _ . ! ~ * ' ( ) $ , ; : @ & = + ] (excluding
the []'s) - and the characters reserved are [ ? / ].
I therefore propose to structure the authority 'server-
id~organisation-id' (that is the server-id and org-id separated by a
tilde).
At the moment we don't support hierarchical server-id's; but I would
like to leave us the option of doing so once we start supporting more
aggressive distribution. We also need to consider that it needs to
remain a valid path-element for use in our existing model-URL's. So
for now I would like to limit server-id to the current standard of
'<alphaNum>+', but ultimately I think we should consider some sort of
delimited hierarchical form (probably dotted).
The organisation-id should be something that will eventually permit
the identification of a registry. For now a dotted hierarchical form
should suffice - although I will make sure the implementation leaves
this as open as possible (the use of a tilde makes this possible).
It has also been suggested that to make it unambiguously clear we are
*not* encoding a hostname as the organisation-id we should invert the
traditional dns-style representation.
So putting all the pieces together: If I am running a mulgara server -
host: pneuma.netymon.com
organisation: netymon.com
server-id: rdfDatabase
model-name: addressBook
The model URL for addressBook remains: rmi://pneuma.netymon.com/
rdfDatabase#addressBook
or: soap://pneuma.netymon.com/
rdfDatabase#addressBook ...etc...
and the model URL for the model is: rdfdb://rdfDatabase~com.netymon/
addressBook
As mentioned at the top of the email, comments are not only welcome
but eagerly desired.
Thanks,
Andrae
--
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd
More information about the Mulgara-general
mailing list