[Mulgara-general] firefox bug and server config

Paul Gearon gearon at ieee.org
Wed Sep 9 05:35:33 UTC 2009


On Tue, Sep 8, 2009 at 10:47 PM, Gregg Reynolds <dev at mobileink.com> wrote:
>
> On Tue, Sep 8, 2009 at 9:07 PM, Paul Gearon <gearon at ieee.org> wrote:
>>
>> On Tue, Sep 8, 2009 at 9:59 PM, Paul Gearon <gearon at ieee.org> wrote:
>> > 2. The JSON spec does not describe any way to handle unicode
>> > characters that do not fit into 16 bits.
>>
>> I just went to RFC 4627 instead of json.org, and there IS a way to
>> encode characters outside of the basic multilinguale plane (that's the
>> technical term for anything larger than U+FFFF).
>>
>> The example they cite encodes the G clef character (U+1D11E) as
>> "\uD834\uDD1E". This is known as using a "surrogate pair".
>>
> Surrogate pairs only apply to UTF-16; UTF-8 and UTF-32 can handle the whole
> enchilada with no special processing.

Ah, I was getting too caught up in the moment. I was thinking of
several different things at once, and conflating a few of them. Of
course, none of this was really UTF-8, which is what we were talking
about.

All the same, I still need to convert the data that is "UTF-8 with
each byte in a char" into actual UTF-8 (or something like a string
that can be encoded into UTF-8).  (I should ask why the Jena parsers
give us data in this strange format.) It will involve less bit
banging, than I was originally thinking, but it's still something I
need to get done.

Regards,
Paul



More information about the Mulgara-general mailing list