[Mulgara-general] URI Normalization
Steve Bayliss
stephen.bayliss at acuityunlimited.net
Fri May 14 08:34:17 UTC 2010
Thanks Paul.
I guess the safest way to proceed would be to ensure that URIs are
normalized where necessary before loading into Mulgara.
Regards
Steve
> -----Original Message-----
> From: mulgara-general-bounces at mulgara.org
> [mailto:mulgara-general-bounces at mulgara.org] On Behalf Of Paul Gearon
> Sent: 13 May 2010 19:11
> To: Mulgara General
> Subject: Re: [Mulgara-general] URI Normalization
>
>
> On Thu, May 13, 2010 at 1:12 PM, Steve Bayliss
> <stephen.bayliss at acuityunlimited.net> wrote:
> > Does Mulgara do - or is it anticipated that it will do -
> any form of URI
> > normalization?
>
> No, this isn't done at all. I recall hearing a discussion around it
> once but the result was that it won't be done. I think the idea was to
> try to provide results that look like queries, rather than modifying
> the queries to match the (normalized) data.
>
> I don't particularly mind if it's done or not. There'd be a slight
> overhead as every time a URI appeared it would need to be normalized,
> but I'd expect that to be reasonably insignificant.
>
>
> > RFC2396 2.3 states
> >
> > "Unreserved characters can be escaped without changing the semantics
> > of the URI, but this should not be done unless the URI
> is being used
> > in a context that does not allow the unescaped character
> to appear."
> >
> > So in theory it would be possible to load triples into
> Mulgara that contain
> > escaped unreserved characters - currently there doesn't
> seem to be any form
> > of URI normalization taking place, so that the escaped and
> non-escaped forms
> > are considered by Mulgara to be semantically distinct -
> which the RFC
> > implies (though noting the words "can" and "should not") is
> incorrect.
> > Section 2.4.2 goes on to state
> >
> > "In some cases, data that could be represented by an unreserved
> > character may appear escaped; for example, some of the unreserved
> > "mark" characters are automatically escaped by some
> systems. If the
> > given URI scheme defines a canonicalization algorithm, then
> > unreserved characters may be unescaped according to that
> algorithm.
> > For example, "%7e" is sometimes used instead of "~" in
> an http URL
> > path, but the two are equivalent for an http URL."
> >
> > No particular need to have this behaviour changed, but it
> would be useful to
> > know if it's likely that URI normalization could/should
> potentially be
> > implemented in the future to determine any likely impact of this.
>
> Well at this point it isn't done, nor is it going to be. I'm more than
> happy to revisit this if enough people consider it worthwhile.
>
> Regards,
> Paul Gearon
> _______________________________________________
> Mulgara-general mailing list
> Mulgara-general at mulgara.org
> http://lists.mulgara.org/mailman/listinfo/mulgara-general
>
More information about the Mulgara-general
mailing list