[Mulgara-general] URI Normalization

Paul Gearon gearon at ieee.org
Fri May 14 13:31:02 UTC 2010


If you're going through an API, then yes. It's not so useful if you're
loading RDF/XML or N3 files.

As a compromise, it would be trivial to add a system flag to turn on
normalization for individual mechanisms that load data. There are
numerous ways to load data through the API, so that might be trickier
to track down, but it's certainly an option for the content handlers
(the code that loads formats like N3 and XML). Would this be useful to
you?

I gave it a little more thought, and I realized that normalizing by
default has all sorts of issues associated with it. For instance, it
can introduce differences between a remote RDF file and a local graph
that is supposed to contain identical data. It might look the same
inside the local store (where it's all normalized), but remote stores
would see the difference. Most of the time it won't matter (and most
of the time we see normalized URIs anyway), but it could really bite
you on the odd occasion.

Paul


On Fri, May 14, 2010 at 4:34 AM, Steve Bayliss
<stephen.bayliss at acuityunlimited.net> wrote:
> Thanks Paul.
>
> I guess the safest way to proceed would be to ensure that URIs are
> normalized where necessary before loading into Mulgara.
>
> Regards
> Steve
>
>> -----Original Message-----
>> From: mulgara-general-bounces at mulgara.org
>> [mailto:mulgara-general-bounces at mulgara.org] On Behalf Of Paul Gearon
>> Sent: 13 May 2010 19:11
>> To: Mulgara General
>> Subject: Re: [Mulgara-general] URI Normalization
>>
>>
>> On Thu, May 13, 2010 at 1:12 PM, Steve Bayliss
>> <stephen.bayliss at acuityunlimited.net> wrote:
>> > Does Mulgara do - or is it anticipated that it will do -
>> any form of URI
>> > normalization?
>>
>> No, this isn't done at all. I recall hearing a discussion around it
>> once but the result was that it won't be done. I think the idea was to
>> try to provide results that look like queries, rather than modifying
>> the queries to match the (normalized) data.
>>
>> I don't particularly mind if it's done or not. There'd be a slight
>> overhead as every time a URI appeared it would need to be normalized,
>> but I'd expect that to be reasonably insignificant.
>>
>>
>> > RFC2396 2.3 states
>> >
>> > "Unreserved characters can be escaped without changing the semantics
>> >    of the URI, but this should not be done unless the URI
>> is being used
>> >    in a context that does not allow the unescaped character
>> to appear."
>> >
>> > So in theory it would be possible to load triples into
>> Mulgara that contain
>> > escaped unreserved characters - currently there doesn't
>> seem to be any form
>> > of URI normalization taking place, so that the escaped and
>> non-escaped forms
>> > are considered by Mulgara to be semantically distinct -
>> which the RFC
>> > implies (though noting the words "can" and "should not") is
>> incorrect.
>> > Section 2.4.2 goes on to state
>> >
>> > "In some cases, data that could be represented by an unreserved
>> >    character may appear escaped; for example, some of the unreserved
>> >    "mark" characters are automatically escaped by some
>> systems.  If the
>> >    given URI scheme defines a canonicalization algorithm, then
>> >    unreserved characters may be unescaped according to that
>> algorithm.
>> >    For example, "%7e" is sometimes used instead of "~" in
>> an http URL
>> >    path, but the two are equivalent for an http URL."
>> >
>> > No particular need to have this behaviour changed, but it
>> would be useful to
>> > know if it's likely that URI normalization could/should
>> potentially be
>> > implemented in the future to determine any likely impact of this.
>>
>> Well at this point it isn't done, nor is it going to be. I'm more than
>> happy to revisit this if enough people consider it worthwhile.
>>
>> Regards,
>> Paul Gearon
>> _______________________________________________
>> Mulgara-general mailing list
>> Mulgara-general at mulgara.org
>> http://lists.mulgara.org/mailman/listinfo/mulgara-general
>>
>
> _______________________________________________
> Mulgara-general mailing list
> Mulgara-general at mulgara.org
> http://lists.mulgara.org/mailman/listinfo/mulgara-general
>


More information about the Mulgara-general mailing list