[Mulgara-general] Model-URI/URL Use-cases and Requirements and Proposal

Andrae Muys andrae at netymon.com
Wed Sep 19 11:36:09 UTC 2007


I would greatly appreciate any comments anyone may have - please also  
feel free to solicit comments from outside the mulgara community if  
there is interest.

* The Use Cases and Requirements

As discussed in my previous email the three key requirements of a  
model-URI proposal are:

1. Protocol/Scheme independence
2. Model/Server mobility
3. URI-standards compliance (ie. no fragment)

Also desirable are

4. Unique-name
5. Namespaced to allow a) potential resolution; b) predicable, human- 
readable URI's.

The context of the most complex use-case involves 4 models and 4  
machines (and assumes a Distributed or Federated Resolver)

:modelA is on server1 on host1 and needs to reference :modelB  
and :modelC
:modelB is on server2 on host2
:modelC is on server3 on host3
:modelD is on server4 on host4 run by an unrelated organisation

The application needs to perform the query:

select $identifer subquery(
   select $s $p $o where $s $p $o in $location and $identifer  
<mulgara:locatedAt> $location in <mulgara:modelURLResolver>)
from host1:modelA
where [ <:useModel> $identifier ] ;

Which queries each model listed in :modelA after converting their  
identifier into a URL via a posited resolution mechanism.

Now host2 fails, and we restore server2 on host3 to run alongside  
server3.

We would like to be able to have the query run unmodified.

What this means is that :modelB cannot encode host2 in its URI.

The URI does need to encode some sort of server-id as servers are  
guaranteed to use the same model-names at least some of the time  
(consider all system-model's have the name "").

Also because :modelD and :modelA-C are managed by unrelated  
organisations we must somehow encode the organisation in the model's  
URI-stem as they may well decide to use the same server-id ("server1"  
or "database" anyone?).


Also consider that any encoding of the organisation must also allow  
that organisation to maintain their own independent registry, or the  
proposal ceases to be scale-free (it's on this that the original UUID  
proposal floundered).

I have considered abandoning requirement 4, and just using URL's.   
However ultimately we require a canonical name for internal purposes  
(even if it isn't exposed externally), and so even using URL's we  
would have to pick a designated 'unique name' for the model - we  
can't escape that - so we might as well save ourselves the headache  
and make it unambiguous.

So a summary of my thinking on the use-cases/requirements for rdf  
model-names - we desire:

1. Unambiguously an identifier
2. Encodes organisation
3. Encodes server-id
4. Doesn't encode hostname
5. Potentially resolvable via a per-organisation registry

* Proposal

If we wish to be unambiguous then we should use our own URI-scheme.   
This has the added benefit that once we use our own scheme we have a  
lot more flexibility regarding how we structure the rest of the URI  
to meet our requirements.

I am proposing to use the scheme 'rdfdb' - as did the original UUID  
proposal.

I would prefer to avoid the use of opaque URI's; there is no reason  
why our URI can't be introspected if we structure it sanely - so the  
structure according to RFC2396 will be 'rdfdb://authority/path'.

Logically the model-name itself makes a good path so we arrive at  
'rdfdb://authority/modelName'.  Leaving the need to encode an  
organisation and a server-id in the authority in a fashion that will  
potentially permit resolution via a registry.

Now as the authority is not a hostname, RFC2396 identifies us as a  
"Registry-based Naming Authority".  As such, the characters we have  
permitted to us are [ - _ . ! ~ * ' ( ) $ , ; : @ & = + ] (excluding  
the []'s) - and the characters reserved are [ ? / ].

I therefore propose to structure the authority 'server- 
id~organisation-id' (that is the server-id and org-id separated by a  
tilde).

At the moment we don't support hierarchical server-id's; but I would  
like to leave us the option of doing so once we start supporting more  
aggressive distribution.  We also need to consider that it needs to  
remain a valid path-element for use in our existing model-URL's.  So  
for now I would like to limit server-id to the current standard of  
'<alphaNum>+', but ultimately I think we should consider some sort of  
delimited hierarchical form (probably dotted).

The organisation-id should be something that will eventually permit  
the identification of a registry.  For now a dotted hierarchical form  
should suffice - although I will make sure the implementation leaves  
this as open as possible (the use of a tilde makes this possible).

It has also been suggested that to make it unambiguously clear we are  
*not* encoding a hostname as the organisation-id we should invert the  
traditional dns-style representation.

So putting all the pieces together:  If I am running a mulgara server -

host:         pneuma.netymon.com
organisation: netymon.com
server-id:    rdfDatabase
model-name:   addressBook

The model URL for addressBook remains: rmi://pneuma.netymon.com/ 
rdfDatabase#addressBook
                                    or: soap://pneuma.netymon.com/ 
rdfDatabase#addressBook   ...etc...

and the model URL for the model is: rdfdb://rdfDatabase~com.netymon/ 
addressBook


As mentioned at the top of the email, comments are not only welcome  
but eagerly desired.

Thanks,

Andrae

-- 
Andrae Muys
andrae at netymon.com
Senior RDF/SemanticWeb Consultant
Netymon Pty Ltd





More information about the Mulgara-general mailing list