[Mulgara-dev] Graph (model) renaming

Mon May 12 15:59:02 UTC 2008

Relative URIs have to be converted to absolute depending on the context, to making sure that logic is right might be a pain. Absolute URIs are not portable between servers, so if you have a data set that you want to install on your own server you have to make sure to do that right.  Also annoying.

I think adding a level of URI escaping has the disadvantages of being less readable, and more error prone because you have to track where you are to determine how many times to escape or unescape.

I think the rdfdb: construct is not intuitive.  It's not a different service on the identified resource is it?  Perhaps it's compact because we're putting the information type in the URI type specifier.  It looks like teh need is to have different information spaces on the same server, one for the tuple data, one for the graphs, and so on.   RIght now the default space is tuples.  it almost seems like you might want a query string like ?subset=graph&name=graphname.  Following Andrae's suggestion to a generalization would mean identifying a new URI type for every information space on the server.

-bill

----- Original Message ----
From: Paul Gearon <gearon at ieee.org>
To: Mulgara Developers <mulgara-dev at mulgara.org>
Sent: Friday, May 9, 2008 4:08:06 PM
Subject: Re: [Mulgara-dev] Graph (model) renaming

Hi everyone,

Andrae and I have continued to discuss the graph renaming plan.  We've
come to some agreement, but not on everything.

Regardless as to the type of URI we store in a database, we want to
globally access it via a URL. This will permit clients that can
interpret a scheme as a protocol to access the data. One example of
this is the current "console" application that automatically connects
to servers based on the rmi: based URL of a graph. We also use URLs
like this in the current distributed resolver, which uses URLs to find
the various servers used in a query.

Looking at hierarchical URIs for identifying a graph will need to
specify the protocol, a host identifier (network name, IP address,
etc), a server name (the default is usually "server1") and then the
actual name for the graph. Until now, the usual scheme for this is to
use the host as the authority, the server as the full path, and the
graph name in the fragment.  ie. rmi://host/server#graphName

As David Wood (rightly) pointed out, we don't want to use a fragment
anymore. To avoid this we could use extra path segments:
  rmi://host/server/graphName
Or a path segment param:
  rmi://host/server;graphName
Or a query:
  rmi://host/server?graph=graphName

Andrae and I both like the last form. The query form doesn't actually
have a specified syntax, as it is "interpreted by the resource", but
the parameter=value form commonly used for http URLs appeals to us. It
is both readable, and lets us add in new parameters at a later date,
if that becomes desirable.

Now for the part we DON'T agree on.....

Andrae would like to see graph names as being universal. To achieve
this, all graph names on a server are relative URIs, which get
resolved within the context of the server's base URI. His proposed
base-URI for a server is:
  rdfdb://servername~orgname/

As far as simple graph names go, this will work from my perspective.

However, I disagree that it be mandatory for graph names to be
universal. I've seen nothing in the RDF specs that require this
(please point me to it if I missed it). I also noted recently that TBL
said he wished he'd make the "U" in URI stand for "Universal", rather
than "Uniform". "Universal Resource Identifiers" would certainly make
for a "cleaner" world, but we're stuck with "Uniform". The one
exception to this is URNs, which *are* required to be globally unique.

For better or for worse, people will want to duplicate graph names
between servers. There are numerous use cases for people with a graph
URI like <http://foo.com/bar/baz> to want to bring this into a
database without changing the name. Chris asked for this feature, and
I regularly make use of it myself. It will certainly help us play
nicely with SPARQL.

What this means is that Andrae would like to ONLY store relative URIs
in the database, while I would like to store both relative and
absolute URIs. (That's your position, right Andrae?)

Going with either suggestion, a graph name of "foo" would be stored as
the relative URI "foo". In that case, the graph would have a full URI
of rdfdb://servername~orgname/foo.  Referencing it remotely with RMI
would have a URL of rmi://hostname/servername?graph=foo

Absolute URIs would be different. If you were referring to a graph like:
  http://example.org/testdata
Then in my scheme this URI would be stored directly. Since this is an
absolute name, it is possible for this to conflict with another graph
located on a different server.  This demonstrates the lack of
universality in URIs. However, you have to be connected to a server
before you can use a URI like this, so there is an implicit context.
The issue would be if we create a centralized registry for graphs, and
two separate servers attempted to register graphs that have the same
URI.

Andrae's plan (if I'm getting it right) would map absolute URIs to
relative URIs like this:
  http%3A%2F%2Fexample.org%2Ftestdata
giving it an absolute URI of:
  rdfdb://servername~orgname/http%3A%2F%2Fexample.org%2Ftestdata

Auto-encoding of absolute URIs could be used to hide some of this
complexity, meaning that the APIs may be similar in either case. The
main time a user would see these complex URIs would be in requesting
the location of the graph from a central registry. After all, if
you're connected to a server then you can use URIs relatively (meaning
that the fancy scheme/authority are not needed) and if you're *not*
connected to the server you can just use an appropriate URL (as
outlined above).

So..... does anyone have any thoughts on this?

Paul
_______________________________________________
Mulgara-dev mailing list
Mulgara-dev at mulgara.org
http://mulgara.org/mailman/listinfo/mulgara-dev