[Mulgara-general] Status
Paul Gearon
gearon at ieee.org
Fri Aug 6 15:22:09 UTC 2010
It occurred to me that despite a lot of offline conversations, these
mailing lists have been very quiet lately. Since I'm still the
principal person working on Mulgara, I thought I should give people an
idea of the current status.
We have a few developers who still associate themselves with the
project. However, of late, only Alex and myself have been making
contributions. I find this ironic, since the user base has been
expanding recently. I appreciate that some of the system is overly
complex for a new developer to jump into, but there are lots of areas
that are trivial to work in, and there are people here (both active
and inactive contributors) who would be more than happy to point
newcomers in the right direction. So if anyone is interested at all,
then we could really use the help!
Also, for anyone unaware, I am no longer working on Mulgara as my day
job. Things have been economically difficult all over, which forced
Duraspace (formerly Fedora Commons) to focus their resources on their
core activities of document storage and archiving. I now work for
Revelytix, which is where Alex also works. They do use Mulgara
internally (for now), but my work will be focused on completely
different systems in the Semantic Web. So Mulgara has been relegated
to evening and weekend activity. That's OK... I sometimes do my best
work at night. ;-)
Now for recent work.....
1. Alex and I have both (independently) been looking at the speed of
re-ordering results, both within a query (necessary for certain types
of joins) and at the end (for an ORDER BY). We've both learnt a lot
about what works and what doesn't. We're still a way off checking in
code that will make a significant difference, but when we do then
certain queries will get a lot faster. Some of this work is also going
to DISTINCT queries that aren't ordered. I know of one use case where
this will take a query that takes several minutes (on EVERY available
triplestore - not just Mulgara) and have it return data in under a
second.
2. I've started in on the new SPARQL grammar. I don't have the cycles
to implement everything yet, so I'm focusing on SPARQL Update 1.1,
since I consider standardizing updates to be one of the most important
features of SPARQL 1.1. A lot of the other features already exist in
SPARQL in one form or another so a lot of the work will be in wiring
the grammar up to the functionality. For instance, we already have
mechanisms for transitive closures, aggregates, remote services and
sub-queries. It's fiddly though, so I'm aiming at Updates first.
3. An RDFa parser is long overdue. I need one for a home project right
now, so it's finally getting done. This is something *anyone* could do
since it's just matter of gluing 2 APIs together.
4. Following on from item 1, I'm also doing some new indexing work to
make data load faster, and ultimately read data with fewer disk seeks.
Some small amount of that is in source control already, so loading and
querying should already be quicker for anyone using SVN. That stuff
will be in the next release, though the bigger indexing changes are
still some way off.
There have been a few bug fixes lately, and I'm working on another one
right now (the CONSTRUCT problem mentioned in another thread). Once
these are done and the RDFa parser integration is checked in, then
I'll do release 2.1.9. After that, I'll be aiming to make SPARQL
Update 1.1 the main feature for 2.1.10.
Regards,
Paul
More information about the Mulgara-general
mailing list