[Mulgara-dev] configuring resolvers

Steve Bayliss stephen.bayliss at acuityunlimited.net
Mon Nov 9 17:34:03 UTC 2009


Hi Paul

Many thanks for your response.  I think I'm getting a clearer picture of
what I need now.

> > A specific use-case, the ability to search for literals 
> "beginning with"
> > rather than the full text.
> 
> This sounds like you need a new option for searching indexed data,
> rather than reconfiguring a resolver. I suppose it could be seen as a
> way of telling a resolver to configure its indexes differently, but
> you'll still need different mechanisms for searching those indexes,
> and that will pretty much need a new resolver anyway.

So here you're saying it could be a resolver working against Mulgara's 
existing indexes, and I don't need the Lucene index for this (which in 
any case doesn't index what I want - unless I go ahead and create 
a new index).

> > I think I can see how this could be done by adding a new field in
> > org.mulgara.resolver.lucene.FullTextStringIndex.java (and 
> presumably some
> > mods to FullTextStringIndexTuples.java).

Actually I did manage to get a working patch for this, based on creating a
new index in the Lucene resolver... but it is a bit of a hack (but helps
me understand what is going on in the code!)

> 
> Actually, it was created 2 weeks ago, but isn't in a release yet.  :-)
> 
> The PrefixResolver is used to find things that start with a specific
> string. It was created specifically to find all URIs that start with
> the same domain, so the current release only lets you use it on URIs.
> However, James and I recently made some changes that allows it to be
> used on string literals as well. Unfortunately for James, he needed it
> to be case insensitive, but our indexes store strings with case
> sensitive ordering. But it sounds like it might work for you.

In my case I'd also like it to be case-insensitive, so I guess it is not
going to work for me either...  So it sounds like for me a case-insensitive
index would need building in this approach (Lucene or ...)?

> 
> The other alternative is to use a filter in SPARQL. We have 2 tests
> that will work here:
>   "Stephen" = fn:substring(?str, 0, 7)
>   regex(?str, "^Stephen")

Thanks - I will give these a go (need to brush up on my SPARQL in any case)
- do you think there would be much of a performance hit in this approach
(against having a purpose-built index)?

Also, to extend this so it filters based on un-accent-ised text (eg literals

"Bó", "Bò" and "Bo" all being found by a query string "Bo") presumably what 
is required is either a function to do this "normalisation", or to be able 
to handle it in regex?

> 
> > But it would be just adding in one special case; it would 
> be great to be
> > able to do other things, eg when analysing ignore accented 
> characters,
> > diacriticals etc (presumably some mods to query parsing 
> also required).
> 
> Unless you want to create a new resolver than can index strings
> appropriately here, then the only way to do this would be via a
> filter.

If it was a case of full-text searching (not start-of-string, which is 
my case), could there be a case for modifying the existing Lucene resolver 
to "plug in" things like org.apache.lucene.analysis.ISOLatin1AccentFilter?

> 
> Unfortunately, we don't have filters in TQL, only in SPARQL. But
> they'd be easy enough to add if there was a real need for them.
> Personally, I'm more inclined to extend SPARQL to do anything that TQL
> can do.
> 

Agreed.




More information about the Mulgara-dev mailing list