[Mulgara-general] firefox bug and server config

Paul Gearon gearon at ieee.org
Wed Sep 9 13:18:38 UTC 2009


On Wed, Sep 9, 2009 at 7:49 AM, Gregg Reynolds <dev at mobileink.com> wrote:
>
> On Wed, Sep 9, 2009 at 12:40 AM, Paul Gearon <gearon at ieee.org> wrote:
>>
>> On Tue, Sep 8, 2009 at 10:58 PM, Gregg Reynolds <dev at mobileink.com> wrote:
>> > Howdy Paul,
>> > Thanks for the detailed response.  I'm pretty booked through next week
>> > but
>> > will then look at config and unicode source.  Meantime I think I'll add
>> > some
>> > stuff to the "Architectural Proposals" section of the wiki - Unicode
>> > support
>> > is pretty important and by no means trivial.  Aside from the three basic
>> > encodings (transformation syntaxes, encoding forms, whatEVER they call
>> > them,
>> > I mean utf-x), you've got at least two other major issues if you ever
>> > want
>> > to attract international support, namely date stuff and collations.
>> >  Pretty
>> > essential for SPARQL filters.
>>
>> My other question is about collations. You're talking about ordering,
>> right? We could easily make these pluggable, but I've seen nothing in
>> any of the standards to suggest this is needed, or even an option.
>>
>> So what exactly is your issue here?
>>
> SPARQL's "ORDER BY" clause, using "<"; for semantics the SPARQL definition
> refers to XQuery/XPath function "fn:compare", which supports multiple
> collations.

This function supports multiple collations, true, but if you look
again you'll notice that SPARQL does not support it at all.

I *did* notice that SPARQL leaves comparison between language tagged
literals as "undefined", meaning that it *is* possible to use
collations here. So for instance, 'Strasse'@de and 'Straße'@de could
compare equal, while 'Strasse' and 'Straße' differ.

> It's a major hairball, because you've got to deal with, among
> other things, Unicode normal forms.  Then there's the general weirdness,
> like "ch" is one "letter" in Spanish, the Germans use more than one sort
> order, various languages have more than one "alphabetic" order (e.g.
> Japanese Iroha, Arabic abjad), etc.  The ICU documentation has some good
> examples.
> It's not a big issue for my project (I don't believe any of the standards
> supports traditional Arabic sorting, which is based on root), but your
> market expands considerably if you support e.g. East Asian calendars and
> collations.
> I don't mean to imply that Mulgara is broken when I suggest looking at ICU,
> only that Mulgara is an RDF database project, not an i18n project, so I
> would not expect it to have implemented the whole range of stuff supported
> by a dedicated i18n project like ICU.  It would be a minor miracle if it
> did.  Implementing high-quality i18n functionality is a major investment,
> and it's already available as open source.  (There may be other, good
> reasons not to migrate to ICU, of course, but I think it's worth a look).


You had me worried for a moment that we'd missed something else in the
spec, but I'm pleased to see that we haven't.  :-)

It's certainly interesting, and I can see it would be a worthwhile
extension beyond the standard for some applications. But as you point
out, it would involve a substantial investment, which is beyond the
scope of anyone working on Mulgara at the moment. It wouldn't be so
hard, if only I could find a library that would implement fn:compare.
Unfortunately, I have yet to find ANY library that does the
XQuery/XPath functions.

Incidentally, this is a general request for anyone reading this.....
if you know of anything that can do the range of XQuery/XPath
functions then I'd love to hear from you. Specifically, I'm after
something that provides an javax.xml.xpath.XPathFunctionResolver. At
this point I'm even prepared to consider creating a plugin framework
for commercial libraries. What I *can* tell you is that Saxon and
Apache don't provide them. EXSLT has a lot of functions, but only
extensions to XPath, and not the actual XPath functions (I'm
considering adding the EXSLT functions to Mulgara anyway).

Regards,
Paul Gearon



More information about the Mulgara-general mailing list