[Mulgara-dev] Questions about the string-pool.

Paul Gearon gearon at ieee.org
Tue Dec 12 18:15:04 UTC 2006


On 12/12/06, Andrae Muys <andrae at netymon.com> wrote:
>
> >> My questions:
> >>
> >> Does it make any sense to perform interval/value operations for
> >> anything except Typed-Literals?
> >> Does it make any sense to perform heterogeneous interval/value
> >> operations?
> >> Does it make any sense to perform heterogeneous interval/type
> >> operations?
> >>
> >> As far as I can tell the answers are
> >>
> >> No.
> >> No.
> >> Not between RDF-Node types (but it does make sense to request all
> >> Literals - untyped, typed, and language-coded).
> >>
> >> Comments?


Yes
No
Yes - I agree with your qualifier.

If possible I'd also like to see selection of groups of types within
Literals.  ie. selecting nonNegativeInteger would include unsignedLong,
unsignedInt, unsignedShort, unsignedByte, and positiveInteger.
http://www.w3.org/TR/xmlschema-2/#built-in-datatypes

(though it's probably just as easy to do a lazily-evaluated union for all
these types).

> We have a resolver to be able to do case-insensitive literal
> > comparisons, and that uses the current findGNodes() for that. This
> > works on both typed and untyped literals. So unless I'm
> > misunderstanding something, I would say yes to the first two (unless
> > there's a better way).
> >
> > Also, the PrefixResolver makes use of interval-by-value on untyped
> > literals.
>
> I just finished a 1 hour conversation with Simon Raboczi, and I can
> definately confirm that the answer to the first question is Yes.


Yes, I was going to mention the PrefixResolver.  This is essential for
finding URIs in the same domain.

Ideally, the string pool will be able to optimize for space (and time) on
the URIs in shared domains.  RDF makes extensive use of this sort of thing,
and Mulgara does not take advantage of it (instead the entire URI is
stored).

I am confident that the third question is that nodes need to be
> selectable by
> 1. RDF-Node type (BN,URI,Literal)


BNs have only ever been findable by their absence from the StringPool.  This
actually matches RDF semantics pretty well.  :-)

2. Typed-Literal type (for typed-literals only)


Yes.  It would be nice to have related types in order so that they can be
selected by interval (as I describe above), but unions may be the better
solution here.

I've also discovered that it would be REALLY nice to select xsd:string and
untyped literals together.  This is because their domains are identical, and
people (and their code) tend to use them interchangeably.  You can argue all
you like that this shouldn't happen, but the fact is that it does.
Inconvenient really.  :-)

Do they need to be selectable by language?  - I don't believe so but
> would appreciate input.


I don't speak any other languages, so I can't comment on use cases here.
However, for people who do speak another language, I can only presume that
this would be an important feature.

Language should be a subtype within xsd:string (and "untyped"), shouldn't
it?  It doesn't make sense to apply it to numbers, dates, etc.


The second question comes down to how we define total-orders wrt types.
>
> Ultimately it can be reduced to answering the questions:
> 1. Does ("2006/02/15"^^Date < $x < "3.14"^^Double) make sense?
> 2. Does (<some:uri> < $x < "The quick brown fox...") make sense?
> 3. Does ("2006/02/15" $x < "2006/05/15"^^Date) make sense?
>
> and I believe the answer to all three questions is No.


I agree.  There is no semantic comparison.

However, I do think there is a need to compare xsd:string to untyped
literals (yes, it's a pain).  This may be worth keeping in mind.


Certainly my XA2 StringPool design currently assumes the answer is No.


I'm glad to see you considering this carefully.  I haven't performed
metrics, but I suspect that the StringPool is a big bottleneck for us, and
yet it's almost never looked at.

The other thing is that a Trie design might let us do more intelligent
searching within strings.  I'm not sure how feasible this is (considering
that we want it to be transaction safe and scalable), but it would be a
great feature if we could include wildcard searching.

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mulgara.org/pipermail/mulgara-dev/attachments/20061212/80a98402/attachment.htm>


More information about the Mulgara-dev mailing list