[Mulgara-general] IRI parsing flakiness

Paul Gearon gearon at ieee.org
Sun Sep 20 00:53:26 UTC 2009


Hi Greg,

It took me a while to track it all down, but I finally got all the
references I need.  :-)

The problem is with your N3. It turns out that what you've been trying
are not valid QNames. You'll find that if you go back to using URIs
then it will work just fine. So the following N3 will load fine:

@prefix test:  <http://example.org/test#> .
test:Test-comma  test:uri  <http://example.org/test#U,062F>  .

The problem you ran into is that QNames do not admit as many
characters as URIs do. A QName is defined as a possible prefix with a
colon and then a "local part":
http://www.w3.org/TR/REC-xml-names/#NT-LocalPart

The "local" part of a QName accepts the same characters as an XML
name, minus the ":" character.
http://www.w3.org/TR/REC-xml-names/#NT-NCName

And XML names are defined as:

[4]   	NameStartChar  ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
[#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF]
| [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a]  NameChar  ::=  NameStartChar | "-" | "." | [0-9] | #xB7 |
[#x0300-#x036F] | [#x203F-#x2040]
[5]  Name  ::=  NameStartChar (NameChar)*

http://www.w3.org/TR/REC-xml/#NT-NameStartChar


So you can have dots, dashes and underscores, but no + or commas. So
you'll have to keep your QNames simple, or else just use URIs (which
are delimited by angle brackets).

I should also note the that reason for our "correct" behavior here is
because N3 parsing is handled by the parser from Jena. The bad QNames
were being detected in there, and the Mulgara code was simply
reporting the error.

Regards,
Paul Gearon



On Sat, Sep 19, 2009 at 1:06 PM, Gregg Reynolds <dev at mobileink.com> wrote:
> Hi,
>
> It looks like IRIs containing "reserved" and "mark" chars don't agree with
> Mulgara, although as I understand RFC 2396 (section 2.2, 2.3, 3.3) they
> should be allowed in various components (e.g. '+' in paths).  See ticket
> 205.
>
> Also wrote simple scripts to automate testing each such char.  It's a bit of
> a pain since some reserved chars are ok in some components but not in
> others.  Plus Mulgara's error messages are not exactly models of precision.
> ;)  So I created a little script to generate a bunch of N3 files, one goofy
> URI per, and a script to load 'em up and write results in separate files, so
> it's easy to see where the error is.  I've found, for example, that ','
> works in something like "U,0623", but not "U,062F" - chokes on the F.  '+'
> doesn't work in either "U+0623" or "U+062F".
>
> The scripts are in the test/iri subdirectory of Mulligan.  Should be easy to
> modify for testing purposes.  (BTW the Ajax stuff in Mulligan seems
> basically to be working, but you have to serve the webpage up through a
> local server, just running it from the file system won't work.)
>
> Hope that helps a bit,
>
> gregg
>
> _______________________________________________
> Mulgara-general mailing list
> Mulgara-general at mulgara.org
> http://mulgara.org/mailman/listinfo/mulgara-general
>
>



More information about the Mulgara-general mailing list