[Mulgara-dev] FW: [Fedora-commons-users] Fedora Resource Index Query Serviceveryslow

Paul Gearon gearon at ieee.org
Mon Dec 14 15:38:10 UTC 2009


There were a few improvements around date ranges, but nothing else
that is really specific to that query. There have been general
improvements throughout the year, but I'm a little startled at that
improvement (for that query).

Also, the datatyping graphs are automatically created now. These
graphs all start with sys:, so the xsd graph is <sys:xsd>. But
creating your own graphs still works.

Regards,
Paul Gearon

On Sat, Dec 12, 2009 at 12:41 PM, Steve Bayliss
<stephen.bayliss at acuityunlimited.net> wrote:
> This came up on the fedora-commons-users list - wondered if there was anything specific to this TQL query that would cause such an improvement between 2.1.1 and 2.1.4?
>
> Regards
> Steve
>
>
> -----Original Message-----
> From: Steve Bayliss [mailto:stephen.bayliss at acuityunlimited.net]
> Sent: 12 December 2009 17:38
> To: 'Markus Höckner'
> Cc: fedora-commons-users at lists.sourceforge.net
> Subject: Re: [Fedora-commons-users] Fedora Resource Index Query Serviceveryslow
>
>
> Hi Markus
>
> I just tried a comparison here between Mulgara 2.1.1 (as per current Fedora 3.2.x release) and Mulgara 2.1.4 and the difference is pretty dramatic.  Note that these results are on a pretty under-powered Windows VM (VMWare server) so I'd expect significantly better results on your hardware.  I'll give it a go on some decent hardware here at some point.
>
> The reason you're not getting the web interface is that you are starting Mulgara with some non-standard ports to avoid conflict with Tomcat - try on port 18080 (if that's what you used from my instructions - you can of course choose different ports when starting Mulgara).
>
> Mulgara 2.1.1
> data load: 112.842 seconds
> query: 729.765 seconds
>
> Mulgara 2.1.4
> data load: 94.450 seconds
> query: 1.672 seconds
>
> I did run the queries a few times and got results that were similar.
>
> I see the release notes for Mulgara 2.1.3 include "Enabled several query optimizations that had been coded and tested but not enabled" - maybe it is down to this?
>
> Fedora 3.3, due for release very soon, will include Mulgara 2.1.4.
>
> Regards
> Steve
>
> (for reference, the query was:
>
> select $item $itemID $date $state
> from   <#sampledata>
> where  $item           <http://www.openarchives.org/OAI/2.0/itemID> $itemID
> and    $item     <info:fedora/fedora-system:def/model#state> $state
> and $item <info:fedora/fedora-system:def/view#disseminates> $diss
> and $diss <info:fedora/fedora-system:def/view#disseminationType> <info:fedora/*/DC>
> and    $item     <info:fedora/fedora-system:def/view#lastModifiedDate> $date
> and    $date           <http://mulgara.org/mulgara#after> '2009-11-29T07:23:23.682Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>
> and    $date           <http://mulgara.org/mulgara#before> '2009-12-10T07:25:24.884Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>
> order  by $itemID asc
>
> Note that if you run this without having started Fedora on this Mulgara instance, you'll have to create the #xsd model - see http://www.mulgara.org/trac/wiki/Create and http://www.mulgara.org/trac/wiki/graphTypes - Fedora creates the datatyping graph automatically.)
>
>
>
> -----Original Message-----
> From: Markus Höckner [mailto:hoeckim at yahoo.com]
> Sent: 11 December 2009 17:32
> To: Steve Bayliss
> Cc: fedora-commons-users at lists.sourceforge.net
> Subject: Re: [Fedora-commons-users] Fedora Resource Index Query Service veryslow
>
>
> Hi Steve,
>
>
>
>
>
> 1) What happens when you load the rdf/xml export file into Mulgara 2.1.1 into a separate model (as per Eddie's test), using the web interface and execute the query from there?  What is the performance like?
>
> -> I'm not able to connect to the standalone on the server. I think this is a problem of server configuration - so I've to wait until monday to try this. :)
>
> 2) What happens if you run the query against <#ri> (ie the actual Mulgara model that Fedora is using) from the Mulgara web interface, rather than going through fedora/risearch, what's the performance like?
>
> -> hmmmm, i tried this but it but I'm not able to connect. I tried different URLs just like https://fedora.phaidra.univie.ac.at/webui or https://fedora.phaidra.univie.ac.at/fedora/webui - but this doesn't work. I also didn't find any docu about how to access it...maybe you can help me another time?
>
>
> cheers,
> Markus
>
>
>
> -----Original Message-----
> From: Markus Höckner [mailto:hoeckim at yahoo.com]
> Sent: 11 December 2009 11:26
> To: Steve Bayliss
> Cc: fedora-commons-users at lists.sourceforge.net
> Subject: Re: [Fedora-commons-users] Fedora Resource Index Query Service veryslow
>
>
> Hi Steve,
>
> first of all: thx for the very good TODO list! It worked fine.
>
> Here my results: nothing really changed...the requests/queries take as long as Mulgara embedded... :(
>
> So my search still goes on....it seems to be a "special" problem...
>
> If I find something, I let you know!
>
> Cheers,
> Markus
>
>
>
>
> ----- Original Message ----
> From: Steve Bayliss <stephen.bayliss at acuityunlimited.net>
> To: Markus Höckner <hoeckim at yahoo.com>
> Cc: fedora-commons-users at lists.sourceforge.net
> Sent: Fri, December 11, 2009 10:42:07 AM
> Subject: RE: [Fedora-commons-users] Fedora Resource Index Query Service veryslow
>
> In that case it doesn't sound like the same issue.
>
> Yes, it's possible to run Fedora with a remote (standalone) instance of Mulgara (much in the same way that Fedora can be configured to run against an embedded SQL database or an external instance).
>
> To do this you'll have to amend your fedora.fcfg file.
>
> The basics are
> 1) create a <datastore> section for a remote Mulgara instance
> 2) modify the module config for fedora.server.resourceIndex to use the remote Mulgara
>
> For (1):
>
> - make a copy of the existing <datastore id="localMulgaraTriplestore"> ... </datastore> section and change the id to "remoteMulgaraTriplestore" in the new datastore section
> - in the new datastore ("remoteMulgaraTriplestore") configuration:
> -- change <param name="remote" value="false"> to <param name="remote" value="true">
> -- change <param name="serverName" value="fedora"> to <param name="serverName" value="server1"> (this is the default that Mulgara uses, though you can alternatively change the Mulgara server name in Mulgara config, the -s option I think when starting Mulgara)
> -- add <param name="host" value="localhost"/>
> -- add <param name="port" value="1099"/>
>
> For (2):
> - find <module role="fedora.server.resourceIndex.ResourceIndex"  .... >
> - under there, find <param name="datastore" value="localMulgaraTriplestore"> and change the value to "remoteMulgaraTriplestore"
>
> Note that Fedora 3.2 uses Mulgara 2.1.1 - so you will probably find it easier if you download that version and use it instead of 2.1.4; I think certain libraries in Fedora will need updating to run against 2.1.4 (possibly including trippi) - but Eddie can probably advise better than me on that.
>
> Note also that Mulgara's web interface runs on the same ports as Tomcat, so you should change the default ports with something like --port 18080 --publicport 18081 when starting Mulgara.
>
> You should also probably first, once you have Mulgara 2.1.1 installed, load your rdf/xml file and run query you've already tested, just to ensure that there are no significant performance differences due to the differences in the Mulgara version (that way you'll at least have a baseline comparision of remote vs embedded using the same Mulgara version).
>
> Once you've done the config (making sure first that Mulgara is running)
> - try starting Fedora to check the above config, see if any errors are reported
> - shut down Fedora
> - rebuild the resource index, to populate the external Mulgara instance
>
> You should then be able to start Fedora and run your queries from within fedora/risearch and see what the performance is like.  Please let us know what you find!
>
> Obviously Mulgara is running in its own VM, so you may want to tune the VM resource allocation.
>
> Regards
> Steve
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Markus Höckner [mailto:hoeckim at yahoo.com]
> Sent: 11 December 2009 09:03
> To: Steve Bayliss
> Cc: fedora-commons-users at lists.sourceforge.net
> Subject: Re: [Fedora-commons-users] Fedora Resource Index Query Service veryslow
>
>
> Hi Steve,
>
> - do you have the full-text index enabled?
> -> we are indexing fulltext in gsearch (FoxmlToLucene.xslt) - autoTextIndex (Mulgara) in fedora.fcfg is false.
>
> - are you performing ingests at the same time? (or very recently - you
> may be able to tell from the CPU load whether any "background" indexing
> activity appears to be taking place)
> -> I'm not sure, but I don't think so -> question 1 -> sry, I'm not a fedora config pro :( I'm new with this stuff and I try to learn :)
>
> So it is possible to run Mulgara as standalone...I think I'm going to try this.
>
> Regards,
> Markus
>
>
>
> ----- Original Message ----
> From: Steve Bayliss <stephen.bayliss at acuityunlimited.net>
> To: Markus Höckner <hoeckim at yahoo.com>; Edwin Shin <eddie at fedora-commons.org>
> Cc: fedora-commons-users at lists.sourceforge.net
> Sent: Fri, December 11, 2009 9:15:53 AM
> Subject: RE: [Fedora-commons-users] Fedora Resource Index Query Service veryslow
>
> Hi Markus
>
> Eric's results are with the full-text index enabled, and with updates (ingests) happening as well, which he indicates is causing updates to Mulgara whilst the queries are running (specifically updates to the full-text model).
>
> Is this the same in your case, ie
> - do you have the full-text index enabled?
> - are you performing ingests at the same time? (or very recently - you may be able to tell from the CPU load whether any "background" indexing activity appears to be taking place)
>
> If these are not the case, you could try reconfiguring Fedora to use the external Mulgara instance; rebuild your resource index to populate the external instance, and try your queries (eg through fedora/risearch) again to see if that has an impact on performance.
>
> Regards
> Steve
>
> -----Original Message-----
> From: Markus Höckner [mailto:hoeckim at yahoo.com]
> Sent: 11 December 2009 07:08
> To: Edwin Shin
> Cc: fedora-commons-users at lists.sourceforge.net
> Subject: Re: [Fedora-commons-users] Fedora Resource Index Query Service veryslow
>
>
> Hi Edwin,
>
>
> I tried it on a AMD X2 @2,4GHz - 1GB RAM - Ubuntu 9.08
>
> Almost the same results as in your test. So quite fast....
>
> But I think it's the problem is as Eric Melz is reporting in his post (Fedora Generic Search: search operation).
>
> http://technotes.emelz.com/fedora-performance
>
> thx,
> Markus
>
>
>
>
> ----- Original Message ----
> From: Edwin Shin <eddie at fedora-commons.org>
> To: fedora-commons-users at lists.sourceforge.net
> Sent: Thu, December 10, 2009 8:11:33 PM
> Subject: Re: [Fedora-commons-users] Fedora Resource Index Query Service very slow
>
> Markus,
>
> On my laptop, with a standalone instance of Mulgara 2.1.4:
>
> 46.170 seconds to load all-rdf.xml
> 0.375 seconds to return the following query:
>
> select $item $itemID $date $state
> from   <#sampledata>
> where  $item           <http://www.openarchives.org/OAI/2.0/itemID> $itemID
> and    $item     <info:fedora/fedora-system:def/model#state> $state
> and $item <info:fedora/fedora-system:def/view#disseminates> $diss
> and $diss <info:fedora/fedora-system:def/view#disseminationType> <info:fedora/*/DC>
> and    $item     <info:fedora/fedora-system:def/view#lastModifiedDate> $date
> and    $date           <http://mulgara.org/mulgara#after> '2009-11-29T07:23:23.682Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>
> and    $date           <http://mulgara.org/mulgara#before> '2009-12-10T07:25:24.884Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>
> order  by $itemID asc
>
> I'd recommend trying the same on your local machine that you originally reported the issue with. i.e.,
> 1. download Mulgara 2.1.4
> 2. start Mulgara: java -jar mulgara-2.1.4.jar
> 3. use either webui or tutorial (http://localhost:8080/webui or http://localhost:8080/tutorial) to
>    a) create the graph: create <rmi://localhost/server1#sampledata>;
>    b) load data: load <file:/tmp/add-rdf.xml> into <rmi://localhost/server1#sampledata>;
>    c) query, using the above
>
> Depending on how that goes, you may be able to narrow down the source of the problems.
>
>
>
> On 10 Dec 2009, at 6:41 PM, Markus Höckner wrote:
>
>> Hi Chris,
>>
>>
>>
>>> I'll add to the others' remarks that with so few objects (5k), getting
>>> performance like what you're reporting is very strange.  Are you also
>>> running on a remote filesystem or a particularly slow disk that you
>>> know of?
>>
>> We are using our SAN for storing the objects and the local disk is quite fast...
>>
>> Here is an example that takes about 20-25 seconds for a simple request:
>> https://fedora.phaidra.univie.ac.at/fedora/risearch?query=select+%24item+%24itemID+%24date+%24state%0Afrom+++%3C%23ri%3E%0Awhere++%24item+++++++++++%3Chttp%3A%2F%2Fwww.openarchives.org%2FOAI%2F2.0%2FitemID%3E+%24itemID%0Aand++++%24item+++++%3Cinfo%3Afedora%2Ffedora-system%3Adef%2Fmodel%23state%3E+%24state%0Aand+%24item+%3Cinfo%3Afedora%2Ffedora-system%3Adef%2Fview%23disseminates%3E+%24diss%0Aand+%24diss+%3Cinfo%3Afedora%2Ffedora-system%3Adef%2Fview%23disseminationType%3E+%3Cinfo%3Afedora%2F*%2FDC%3E%0Aand++++%24item+++++%3Cinfo%3Afedora%2Ffedora-system%3Adef%2Fview%23lastModifiedDate%3E+%24date%0Aand++++%24date+++++++++++%3Chttp%3A%2F%2Fmulgara.org%2Fmulgara%23after%3E+%272009-12-04T16%3A17%3A36.023Z%27^^%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23dateTime%3E+in+%3C%23xsd%3E%0Aand++++%24date+++++++++++%3Chttp%3A%2F%2Fmulgara.org%2Fmulgara%23before%3E+%272009-12-10T16%3A19%3A37.184Z%27^^%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23dateTime%3E+in+%
>> 3C%23xsd%3E%0Aorder++by+%24itemID+asc&type=tuples&lang=itql&format=Sparql
>>
>> This is the live repository, not the dev.
>> -> ~50k objects
>> -> 4GB RAM
>>
>> It's a little bit faster, but is this normal?
>> When
>> you change the date (after) from 2009-12-04 to 2009-12-01 it takes
>> about one minute...this is very bad especially for PROAI...
>>
>>
>>
>>> What is the best way to export in RDF/XML?
>>
>> THX for the help! Sometimes I need a little bit of help with such obvious things... :)
>>
>>
>> This is the dev:
>>
>> Here is the export: http://homepage.univie.ac.at/markus.hoeckner/rdf/all-rdf.tar.gz
>>
>> The same query as above takes about 30-40 seconds.
>> -> 5k objects
>> -> 2GB RAM
>>
>>
>> We
>> have raised the number of file descriptors (from 1024 to 2048) for
>> tomcat at the live server and it feels a little bit faster. Is this
>> maybe the problem?
>>
>>
>> thx for your help/time guys,
>> Markus
>>
>>
>>
>>
>>
>>
>> ----- Original Message ----
>> From: Chris Wilper <cwilper at duraspace.org>
>> To: Markus Höckner <hoeckim at yahoo.com>
>> Cc: Edwin Shin <eddie at fedora-commons.org>; fedora-commons-users at lists.sourceforge.net
>> Sent: Thu, December 10, 2009 4:11:50 PM
>> Subject: Re: [Fedora-commons-users] Fedora Resource Index Query Service very  slow
>>
>> Hi Markus,
>>
>> I'll add to the others' remarks that with so few objects (5k), getting
>> performance like what you're reporting is very strange.  Are you also
>> running on a remote filesystem or a particularly slow disk that you
>> know of?
>>
>> 2009/12/10 Markus Höckner <hoeckim at yahoo.com>:
>>> What is the best way to export in RDF/XML?
>>
>> Try something like:
>>
>> wget -O all-rdf.xml
>> http://fedoraAdmin:fedoraAdmin@localhost:8080/fedora/risearch?type=triples\〈=spo\&format=RDF%2FXML\&stream=on\&query=*+*+*
>>
>> Or with curl:
>>
>> curl -o all-rdf.xml
>> http://fedoraAdmin:fedoraAdmin@localhost:8080/fedora/risearch?type=triples\〈=spo\&format=RDF%2FXML\&stream=on\&query=*+*+*
>>
>> - Chris
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Return on Information:
>> Google Enterprise Search pays you back
>> Get the facts.
>> http://p.sf.net/sfu/google-dev2dev
>> _______________________________________________
>> Fedora-commons-users mailing list
>> Fedora-commons-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
>
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
>
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
> _______________________________________________
> Mulgara-dev mailing list
> Mulgara-dev at mulgara.org
> http://mulgara.org/mailman/listinfo/mulgara-dev
>



More information about the Mulgara-dev mailing list