[Mulgara-general] Question about validations

Paul Gearon gearon at ieee.org
Tue Aug 5 23:15:08 UTC 2008


On Tue, Aug 5, 2008 at 5:56 PM, Bill OConnor <wtoconnor at gmail.com> wrote:
> Hello,
>
> It seems that mulgara doesn't like the following:
>
> <link rdf:resource="http://www.xyzzynews.com/xyzzy.1f.html##grc"/>
>
> java.net.URISyntaxException: Illegal character in fragment at index 39
>
>
> But as far as I can tell this seems to be a healthy URL which is accepted by
> the browser. I thought that URL's were a subset of URI's.
> Can someone clarify this?

This is because the browser is forgiving.

A URI is defined as:
URI  = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

The presence of a # indicates that you have a fragment. This means
that the fragment above would be: "#grc"

However, the definition of a fragment is that it may contain the
following characters:
  - . _ ~ ! $ & ' ( ) * + , ; = : @
Or the characters:
  a-z A-Z 0-9
Or a sequence of:
  "%" 0-9a-fA-F 0-9a-fA-F

Note that this does NOT include the # character. So having the second
# character is illegal. In fact, it is not legal to have this
character anywhere in a URI, except to delimit a fragment.

You may have noticed that the exception was given by the Java URI
library class. This class is doing the correct thing.

Regards,
Paul Gearon



More information about the Mulgara-general mailing list