[Mulgara-general] Question about validations
Paul Gearon
gearon at ieee.org
Tue Aug 5 23:15:08 UTC 2008
On Tue, Aug 5, 2008 at 5:56 PM, Bill OConnor <wtoconnor at gmail.com> wrote:
> Hello,
>
> It seems that mulgara doesn't like the following:
>
> <link rdf:resource="http://www.xyzzynews.com/xyzzy.1f.html##grc"/>
>
> java.net.URISyntaxException: Illegal character in fragment at index 39
>
>
> But as far as I can tell this seems to be a healthy URL which is accepted by
> the browser. I thought that URL's were a subset of URI's.
> Can someone clarify this?
This is because the browser is forgiving.
A URI is defined as:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
The presence of a # indicates that you have a fragment. This means
that the fragment above would be: "#grc"
However, the definition of a fragment is that it may contain the
following characters:
- . _ ~ ! $ & ' ( ) * + , ; = : @
Or the characters:
a-z A-Z 0-9
Or a sequence of:
"%" 0-9a-fA-F 0-9a-fA-F
Note that this does NOT include the # character. So having the second
# character is illegal. In fact, it is not legal to have this
character anywhere in a URI, except to delimit a fragment.
You may have noticed that the exception was given by the Java URI
library class. This class is doing the correct thing.
Regards,
Paul Gearon
More information about the Mulgara-general
mailing list