[Mulgara-general] General questions and documentation

Wed Aug 26 12:28:01 UTC 2009

On Tue, Aug 25, 2009 at 5:03 PM, Gregg Reynolds<dev at mobileink.com> wrote:
> Hello, list.

Hi Greg,

> By way of introduction: I'm working with a linguist whose research
> question involves the reconstruction of Proto-Afro-Asiatic.  The
> specific project is a morphological database that involves 70-odd
> languages of the Cushitic-Omotic family (geographically in the region
> around Somalia).  This involves thousands of words, each of which
> might have a dozen or more properties (e.g. gender="masc").  My guess
> is we might end up with 100s of thousands of RDF nodes, maybe even a
> few million.
>
> RDF is great for this for reasons which I assume are pretty obvious to
> this list.  So I looked around at RDF implementations and Mulgara
> looks like the best combination of simplicity and power.  Near-zero
> admin, simple start-up and shut-down, support for embedding, all very
> attractive, since we may end up distributing a stand-alone
> application.  The basic application would be a GUI that allows the
> linguist to drag-n-drop elements of paradigms drawn from the data,
> which is implemented as a web page that sends queries to a SPARQL
> service point.

Will it need to update the data? If so, then we don't do SPARQL/Update
quite yet (it's not too far out on my TODO list). In the meantime,
you'd need TQL.

> So I have a few general questions.  Most obviously, do I have the
> right impression?  Which is, it'll be pretty simple to package up
> Mulgara with some data as a domain-specific application, so the user
> can just click on some kind of install program, and then fire up the
> browser to work with the data.  A web admin interface, that would
> allow the user to e.g. ask for statistics about the database, shut it
> down, restart it, etc. - I assume that would not be especially
> complicated.  Am I deceived?

To spin it up this way should be pretty easy, though it would take a
little code.

We currently start up a Jetty server, and load a couple of webapps
into it. The best place for your webadmin interface would be as
another webapp. Unfortunately, these webapps aren't listed in the XML
config file, though they should be. Right now it's just a couple of
repeated lines of code with different webapp names for each one, so it
will be trivial set this up appropriately for you.

> Regarding rules/inferencing/etc.  I'm pretty familiar with RDF/OWL -
> read all the docs, multiple times, worked throught the formal
> semantics, etc. - but I have little practical experience with anything
> beyond the data modeling stuff.  I'm not entirely at sea when it comes
> to the inferencing stuff, having read up on Descriptive Logics, etc.,
> but there's a pretty big gap (for me, at least) between theory and
> application.  To be more specific:  I see Mulgara has a bunch of rule
> stuff, some kind of rule engine, etc., but I don't know how to place
> that.  Is it fish or fowl?  An RDF database, or a rule engine, or
> both, or?  Is the rule engine a separate component?  Is it like FaCT++
> or Jess or Pellet, or some other beast altogether?  For my application
> we wouldn't need extensive inferencing, at least at first, but from
> reading what little doco there is on Krule I can see how we might use
> it, if for nothing else than data validation.

No, it's a forward-chaining rule engine, while FaCT++ and Pellet are
backward chained (which means they are more "complete" but can't
scale). "Rules" are a forward chaining paradigm anyway, so that
doesn't sound like an issue here. Backward chaining comes into play if
you have complex questions to ask an ontology. Rules are just rules.
:-)

The rules module is a part of the query engine, and is pretty much
built in. It's based on a a system like RETE, but rather than working
iteratively over data and building up memory about the network, it
operates on all the data at once and uses the already existing indexes
in place of the network memory. This allows it to run rules over a
large amount of data pretty quickly, but it's not so efficient for
iterative changes. For this reason, rules are only run when asked for.

(I'd like to get a RETE engine in there as well, for the iterative
changes, but that's work that will have to wait)

We've also been talking to Clark and Parsia about integrating with
Pellet. That was the motivation for the Jena interface in the last
release.

> The problem of course is documentation.  I completely understand, used
> to be a developer myself, so no complaints about developers not
> writing documentation here.  But it's pretty close to a show-stopper.
> And bad (inaccurate) documentation is even worse than no
> documentation.  On the other hand, this is all pretty cutting edge,
> and in fact it's danged hard to find a decent tutorial anywhere on the
> web that makes the whole ontology/reasoning space reasonably clear, so
> I'm not picking on Mulgara.

No, I know you're not picking on us. The problem is that we don't have
much documentation, and the pointer to docs.mulgara.org probably does
more harm than good.

The idea of the wiki is for developers to keep things up to date as
they do them, and to make it easy for people to bring stuff forward
from docs.mulgara.org in an incremental manner. So the wiki may not
have a lot on it, but it should ALL be 100% correct. I *really* want
to know if it isn't.

> So here's where I'm at now: in spite of the lack of good
> documentation, Mulgara seems like the right way to go.  The mailing
> list is responsive and friendly, and it's being used by some projects
> I really like (I probably would not even have considered it if Fedora
> had not chosen it), and it has some excellent features.

My Fedora overlords will be pleased.  :-)  I'm a DuraSpace employee
(for those who don't know, DuraSpace is a joining between the Fedora
Commons and DSpace organizations, and I used to work for Fedora
Commons).

> The documentation problem, well, since it looks like I'm going to have
> to just deal with it, I might as well make a virtue out of it.  So I
> have a proposition:  declare a "month of clarity", starting Sept 1, in
> which documentation will be the number one goal for /everybody/.  Or
> "two weeks of non-obscurity", or even "a day of drudgery".  The thing
> is, I'm willing to do a fair amount of technical writing as I explore
> Mulgara, but before I commit to that I'd like to know that the
> developers understand that documentation is not optional.  If I'm
> going to spend a bunch of time (well, ok, let's say 4-5 hours a week
> for the next month or so), (I just know I'm going to regret saying
> that!), I want to know ahead of time that somebody who Knows What's
> Going on is going to at least read the stuff.  Or, if any of the
> developers wants to do an unstructured brain dump, I can work said
> dump into something structured and readable.

I'm up for it! It is easier to manage something like this when it's
limited in scope each time, so it sounds like a good idea. It's also
more motivating when others are prepared to help.  :-)

> Look at the big picture: it's not /that/ much work to get reasonable
> documentation up and runnning, and (from my honest perspective) almost
> anything would be better than what is now available.  Mulgara is just
> too cool not to have at least decent docs.

Thank you. (about being cool)

> Please don't misunderstand, this is not a "write documentation for me
> or I'll use another product, and then you'll be sorry" threat.  Oh no.
>  That would be too easy.  I'll tell everybody Mulgara is implemented
> in COBOL, then you will indeed be sorry.  Actually, if people are just
> too busy (it happens), I would probably just dump notes on a third
> party wiki page rather than clutter the Mulgara wiki with speculative
> stuff.  As a matter of fact, maybe that would be a Good Thing for the
> Week of Words: start up a wikipage (I like wikidot.com) and just dump
> stuff there, and then use that as source material to produce
> well-organized, edited stuff on the Mulgara site.  Believe it or not,
> I actually enjoy that sort of thing; I've always found technical
> writing at least as challenging and rewarding as coding.

Yes, it is, which is why it tends to get pushed back. If it were
really trivial we'd probably have more of it. I've been concerned
about making things too hard to find in the Wiki, but then again, it's
better to have it there and hard to find than it is for it to not be
there at all.

> But from the response I've received so far, it seems people are
> willing to take a little bit of time, at least in the informal medium
> of a mailing list.  I think the documentation problem could be solved
> in very short order; all it takes is a definite timeframe, the freedom
> for developers to just vent without worrying about
> punctuation/composition/etc, and a few volunteers to shape the stuff.
> Oct 1 rolls around, Mulgara has the best documentation on the web.
> What could possibly go wrong?

LOL.

Regards,
Paul