[Mulgara-general] utf-8 in sparql results
Gregg Reynolds
dev at mobileink.com
Tue Sep 1 01:45:02 UTC 2009
Hi again,
How can I ask Mulgara to return results encoded in utf-8?
I loaded triples containing URIs that use utf-8. A "DESCRIBE" query then
works fine; for example:
prefix afar: <http://rdf.coma.org/data/afar#> DESCRIBE <afar:kà_a_>
Note the presence of "à". The problem is that the result set doesn't seem
to jibe; the triples are correct, but the non-ascii characters don't seem to
be utf-8 encoded. Running the query using cURL, I pipe the results to a
file, which I can open in emacs (non-ascii chars show up as spaces), but if
I try to open it with a dumber editor that expects utf-8 (e.g. OS X
Textedit), the editor complains "not utf-8". If I open it in vim and then
convert to Hex, it says "88".
Where it appears in a string "kà(a)", I get (in emacs) "k\u00E0(a)". But
xE0 is the latin-1 (and UTF-16) encoding for "à". I need utf-8 "u+C3A0" (
http://www.fileformat.info/info/unicode/char/00e0/index.htm)
I get the same result whether I ask for json or xml output. The SPARQL
Protocol definition says "the whttp:outputSerialization is
application/sparql-results+xml with UTF-8 encoding, application/rdf+xml with
UTF-8 encoding".
Also, the XML output does not have an encoding attribute.
I rather suspect my terminal is at fault, but I get the same result in a
standard "Terminal" and also in iTerm, which is set to use utf-8.
Can anyone confirm that utf-8 is properly handled in URIs by Mulgara? Any
idea what I'm doing wrong?
Thanks,
gregg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mulgara.org/pipermail/mulgara-general/attachments/20090831/d31e9ccd/attachment.htm>
More information about the Mulgara-general
mailing list