[Mulgara-dev] configuring resolvers

Paul Gearon gearon at ieee.org
Mon Nov 9 16:06:13 UTC 2009


Hi Steve,

On Mon, Nov 9, 2009 at 10:21 AM, Steve Bayliss
<stephen.bayliss at acuityunlimited.net> wrote:
> I'm also finding a need for this.
>
> A specific use-case, the ability to search for literals "beginning with"
> rather than the full text.

This sounds like you need a new option for searching indexed data,
rather than reconfiguring a resolver. I suppose it could be seen as a
way of telling a resolver to configure its indexes differently, but
you'll still need different mechanisms for searching those indexes,
and that will pretty much need a new resolver anyway.

> eg if a literal (eg dc:creator) contained "Surname, First name", the ability
> to perform a search for "Stephen*" that would return "Stephenson, Robert",
> "Stephenson, George" but *not* "Fry, Stephen".
>
> I think I can see how this could be done by adding a new field in
> org.mulgara.resolver.lucene.FullTextStringIndex.java (and presumably some
> mods to FullTextStringIndexTuples.java).
>
> (Of course I could be missing something blindingly obvious here - maybe
> there is a way to do this already??)

Actually, it was created 2 weeks ago, but isn't in a release yet.  :-)

The PrefixResolver is used to find things that start with a specific
string. It was created specifically to find all URIs that start with
the same domain, so the current release only lets you use it on URIs.
However, James and I recently made some changes that allows it to be
used on string literals as well. Unfortunately for James, he needed it
to be case insensitive, but our indexes store strings with case
sensitive ordering. But it sounds like it might work for you.

The other alternative is to use a filter in SPARQL. We have 2 tests
that will work here:
  "Stephen" = fn:substring(?str, 0, 7)
  regex(?str, "^Stephen")

> But it would be just adding in one special case; it would be great to be
> able to do other things, eg when analysing ignore accented characters,
> diacriticals etc (presumably some mods to query parsing also required).

Unless you want to create a new resolver than can index strings
appropriately here, then the only way to do this would be via a
filter.

Unfortunately, we don't have filters in TQL, only in SPARQL. But
they'd be easy enough to add if there was a real need for them.
Personally, I'm more inclined to extend SPARQL to do anything that TQL
can do.

> I'm happy to give this a go (perhaps enabling it in a similar way to using
> the property mulgara.textindex.reverse.enabled - though I've never actually
> tried the reverse text indexing) and submit a patch if it works ok.

While I don't want to discourage you from looking in the code and
contributing (please do this!) I don't think this is the best approach
for what you've written here.

Incidentally, I've already extended the resolvers to take comma
separated parameters. This allows me to specify different directories
for each index, which is needed if you want them on separate hard
drives (providing a performance improvement, if you have the drives
available).

Regards,
Paul Gearon



More information about the Mulgara-dev mailing list