[Mulgara-general] URI Normalization

David Wood david at zepheira.com
Fri May 14 14:08:31 UTC 2010


In general, I would prefer to see this left to the loader and not forced by the database.  Changing URIs changes identifiers, which changes queries, etc.  It would be safest to ensure that those loading data load what they expect to load.

Regards,
Dave




On May 14, 2010, at 9:31 AM, Paul Gearon wrote:

> If you're going through an API, then yes. It's not so useful if you're
> loading RDF/XML or N3 files.
> 
> As a compromise, it would be trivial to add a system flag to turn on
> normalization for individual mechanisms that load data. There are
> numerous ways to load data through the API, so that might be trickier
> to track down, but it's certainly an option for the content handlers
> (the code that loads formats like N3 and XML). Would this be useful to
> you?
> 
> I gave it a little more thought, and I realized that normalizing by
> default has all sorts of issues associated with it. For instance, it
> can introduce differences between a remote RDF file and a local graph
> that is supposed to contain identical data. It might look the same
> inside the local store (where it's all normalized), but remote stores
> would see the difference. Most of the time it won't matter (and most
> of the time we see normalized URIs anyway), but it could really bite
> you on the odd occasion.
> 
> Paul
> 
> 
> On Fri, May 14, 2010 at 4:34 AM, Steve Bayliss
> <stephen.bayliss at acuityunlimited.net> wrote:
>> Thanks Paul.
>> 
>> I guess the safest way to proceed would be to ensure that URIs are
>> normalized where necessary before loading into Mulgara.
>> 
>> Regards
>> Steve
>> 
>>> -----Original Message-----
>>> From: mulgara-general-bounces at mulgara.org
>>> [mailto:mulgara-general-bounces at mulgara.org] On Behalf Of Paul Gearon
>>> Sent: 13 May 2010 19:11
>>> To: Mulgara General
>>> Subject: Re: [Mulgara-general] URI Normalization
>>> 
>>> 
>>> On Thu, May 13, 2010 at 1:12 PM, Steve Bayliss
>>> <stephen.bayliss at acuityunlimited.net> wrote:
>>>> Does Mulgara do - or is it anticipated that it will do -
>>> any form of URI
>>>> normalization?
>>> 
>>> No, this isn't done at all. I recall hearing a discussion around it
>>> once but the result was that it won't be done. I think the idea was to
>>> try to provide results that look like queries, rather than modifying
>>> the queries to match the (normalized) data.
>>> 
>>> I don't particularly mind if it's done or not. There'd be a slight
>>> overhead as every time a URI appeared it would need to be normalized,
>>> but I'd expect that to be reasonably insignificant.
>>> 
>>> 
>>>> RFC2396 2.3 states
>>>> 
>>>> "Unreserved characters can be escaped without changing the semantics
>>>>    of the URI, but this should not be done unless the URI
>>> is being used
>>>>    in a context that does not allow the unescaped character
>>> to appear."
>>>> 
>>>> So in theory it would be possible to load triples into
>>> Mulgara that contain
>>>> escaped unreserved characters - currently there doesn't
>>> seem to be any form
>>>> of URI normalization taking place, so that the escaped and
>>> non-escaped forms
>>>> are considered by Mulgara to be semantically distinct -
>>> which the RFC
>>>> implies (though noting the words "can" and "should not") is
>>> incorrect.
>>>> Section 2.4.2 goes on to state
>>>> 
>>>> "In some cases, data that could be represented by an unreserved
>>>>    character may appear escaped; for example, some of the unreserved
>>>>    "mark" characters are automatically escaped by some
>>> systems.  If the
>>>>    given URI scheme defines a canonicalization algorithm, then
>>>>    unreserved characters may be unescaped according to that
>>> algorithm.
>>>>    For example, "%7e" is sometimes used instead of "~" in
>>> an http URL
>>>>    path, but the two are equivalent for an http URL."
>>>> 
>>>> No particular need to have this behaviour changed, but it
>>> would be useful to
>>>> know if it's likely that URI normalization could/should
>>> potentially be
>>>> implemented in the future to determine any likely impact of this.
>>> 
>>> Well at this point it isn't done, nor is it going to be. I'm more than
>>> happy to revisit this if enough people consider it worthwhile.
>>> 
>>> Regards,
>>> Paul Gearon
>>> _______________________________________________
>>> Mulgara-general mailing list
>>> Mulgara-general at mulgara.org
>>> http://lists.mulgara.org/mailman/listinfo/mulgara-general
>>> 
>> 
>> _______________________________________________
>> Mulgara-general mailing list
>> Mulgara-general at mulgara.org
>> http://lists.mulgara.org/mailman/listinfo/mulgara-general
>> 
> _______________________________________________
> Mulgara-general mailing list
> Mulgara-general at mulgara.org
> http://lists.mulgara.org/mailman/listinfo/mulgara-general
> 



More information about the Mulgara-general mailing list