[Mulgara-general] utf-8 in sparql results

Paul Gearon gearon at ieee.org
Tue Sep 1 19:20:31 UTC 2009


On Mon, Aug 31, 2009 at 9:45 PM, Gregg Reynolds<dev at mobileink.com> wrote:
> Hi again,
>
> How can I ask Mulgara to return results encoded in utf-8?

I'm pretty sure I'm not doing that. Just recently I was looking at
making sure we parsed encodings, but I'm pretty sure we aren't writing
UTF-8 out (though this is not too hard).

> I loaded triples containing URIs that use utf-8.  A "DESCRIBE" query then
> works fine; for example:
>
> prefix afar: <http://rdf.coma.org/data/afar#> DESCRIBE <afar:kà_a_>
>
> Note the presence of "à".  The problem is that the result set doesn't seem
> to jibe; the triples are correct, but the non-ascii characters don't seem to
> be utf-8 encoded.  Running the query using cURL, I pipe the results to a
> file, which I can open in emacs (non-ascii chars show up as spaces), but if
> I try to open it with a dumber editor that expects utf-8 (e.g. OS X
> Textedit), the editor complains "not utf-8".  If I open it in vim and then
> convert to Hex, it says "88".
>
> Where it appears in a string "kà(a)", I get (in emacs) "k\u00E0(a)".  But
> xE0 is the latin-1 (and UTF-16) encoding for "à".  I need utf-8 "u+C3A0"
> (http://www.fileformat.info/info/unicode/char/00e0/index.htm)
>
> I get the same result whether I ask for json or xml output.  The SPARQL
> Protocol definition says "the whttp:outputSerialization is
> application/sparql-results+xml with UTF-8 encoding, application/rdf+xml with
> UTF-8 encoding".

OK, I just looked at this. I'll have to go through it all and see what
I'm doing. the \uxxxx form is borrowed from another standard... I
think it's N3.

> Also, the XML output does not have an encoding attribute.
>
> I rather suspect my terminal is at fault, but I get the same result in a
> standard "Terminal" and also in iTerm, which is set to use utf-8.
>
> Can anyone confirm that utf-8 is properly handled in URIs by Mulgara?  Any
> idea what I'm doing wrong?

I can't say that I'll be changing the result type in all of the output
formats, but I'll definitely be fixing the output encoding for SPARQL.
Thanks for pointing it out.

Paul



More information about the Mulgara-general mailing list