[Mulgara-general] utf-8 in sparql results

Gregg Reynolds dev at mobileink.com
Tue Sep 1 01:45:02 UTC 2009


Hi again,

How can I ask Mulgara to return results encoded in utf-8?

I loaded triples containing URIs that use utf-8.  A "DESCRIBE" query then
works fine; for example:

prefix afar: <http://rdf.coma.org/data/afar#> DESCRIBE <afar:kà_a_>

Note the presence of "à".  The problem is that the result set doesn't seem
to jibe; the triples are correct, but the non-ascii characters don't seem to
be utf-8 encoded.  Running the query using cURL, I pipe the results to a
file, which I can open in emacs (non-ascii chars show up as spaces), but if
I try to open it with a dumber editor that expects utf-8 (e.g. OS X
Textedit), the editor complains "not utf-8".  If I open it in vim and then
convert to Hex, it says "88".

Where it appears in a string "kà(a)", I get (in emacs) "k\u00E0(a)".  But
xE0 is the latin-1 (and UTF-16) encoding for "à".  I need utf-8 "u+C3A0" (
http://www.fileformat.info/info/unicode/char/00e0/index.htm)

I get the same result whether I ask for json or xml output.  The SPARQL
Protocol definition says "the whttp:outputSerialization is
application/sparql-results+xml with UTF-8 encoding, application/rdf+xml with
UTF-8 encoding".

Also, the XML output does not have an encoding attribute.

I rather suspect my terminal is at fault, but I get the same result in a
standard "Terminal" and also in iTerm, which is set to use utf-8.

Can anyone confirm that utf-8 is properly handled in URIs by Mulgara?  Any
idea what I'm doing wrong?

Thanks,

gregg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mulgara.org/pipermail/mulgara-general/attachments/20090831/d31e9ccd/attachment.htm>


More information about the Mulgara-general mailing list