[Mulgara-general] firefox bug and server config
Gregg Reynolds
dev at mobileink.com
Wed Sep 9 02:47:48 UTC 2009
On Tue, Sep 8, 2009 at 9:07 PM, Paul Gearon <gearon at ieee.org> wrote:
> On Tue, Sep 8, 2009 at 9:59 PM, Paul Gearon <gearon at ieee.org> wrote:
> > 2. The JSON spec does not describe any way to handle unicode
> > characters that do not fit into 16 bits.
>
> I just went to RFC 4627 instead of json.org, and there IS a way to
> encode characters outside of the basic multilinguale plane (that's the
> technical term for anything larger than U+FFFF).
>
> The example they cite encodes the G clef character (U+1D11E) as
> "\uD834\uDD1E". This is known as using a "surrogate pair".
>
> Surrogate pairs only apply to UTF-16; UTF-8 and UTF-32 can handle the whole
enchilada with no special processing. See the hairball at
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#2630, sections 3.8,
3.9, 3.10 for the gory details. My strong recommendation is to start just by
making sure SPARQL json query results are properly utf-8 encoded (json's
default "encoding form" in Unicode-speak.) In the long run the smart thing
to do is use ICU4J <http://site.icu-project.org/>.
-gregg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mulgara.org/pipermail/mulgara-general/attachments/20090908/f5aa5122/attachment.htm>
More information about the Mulgara-general
mailing list