[Mulgara-general] URI Normalization

Steve Bayliss stephen.bayliss at acuityunlimited.net
Fri May 14 08:34:17 UTC 2010


Thanks Paul.

I guess the safest way to proceed would be to ensure that URIs are
normalized where necessary before loading into Mulgara.

Regards
Steve

> -----Original Message-----
> From: mulgara-general-bounces at mulgara.org 
> [mailto:mulgara-general-bounces at mulgara.org] On Behalf Of Paul Gearon
> Sent: 13 May 2010 19:11
> To: Mulgara General
> Subject: Re: [Mulgara-general] URI Normalization
> 
> 
> On Thu, May 13, 2010 at 1:12 PM, Steve Bayliss
> <stephen.bayliss at acuityunlimited.net> wrote:
> > Does Mulgara do - or is it anticipated that it will do - 
> any form of URI
> > normalization?
> 
> No, this isn't done at all. I recall hearing a discussion around it
> once but the result was that it won't be done. I think the idea was to
> try to provide results that look like queries, rather than modifying
> the queries to match the (normalized) data.
> 
> I don't particularly mind if it's done or not. There'd be a slight
> overhead as every time a URI appeared it would need to be normalized,
> but I'd expect that to be reasonably insignificant.
> 
> 
> > RFC2396 2.3 states
> >
> > "Unreserved characters can be escaped without changing the semantics
> >    of the URI, but this should not be done unless the URI 
> is being used
> >    in a context that does not allow the unescaped character 
> to appear."
> >
> > So in theory it would be possible to load triples into 
> Mulgara that contain
> > escaped unreserved characters - currently there doesn't 
> seem to be any form
> > of URI normalization taking place, so that the escaped and 
> non-escaped forms
> > are considered by Mulgara to be semantically distinct - 
> which the RFC
> > implies (though noting the words "can" and "should not") is 
> incorrect.
> > Section 2.4.2 goes on to state
> >
> > "In some cases, data that could be represented by an unreserved
> >    character may appear escaped; for example, some of the unreserved
> >    "mark" characters are automatically escaped by some 
> systems.  If the
> >    given URI scheme defines a canonicalization algorithm, then
> >    unreserved characters may be unescaped according to that 
> algorithm.
> >    For example, "%7e" is sometimes used instead of "~" in 
> an http URL
> >    path, but the two are equivalent for an http URL."
> >
> > No particular need to have this behaviour changed, but it 
> would be useful to
> > know if it's likely that URI normalization could/should 
> potentially be
> > implemented in the future to determine any likely impact of this.
> 
> Well at this point it isn't done, nor is it going to be. I'm more than
> happy to revisit this if enough people consider it worthwhile.
> 
> Regards,
> Paul Gearon
> _______________________________________________
> Mulgara-general mailing list
> Mulgara-general at mulgara.org
> http://lists.mulgara.org/mailman/listinfo/mulgara-general
> 



More information about the Mulgara-general mailing list