[Mulgara-general] General questions and documentation

Tue Aug 25 22:03:52 UTC 2009

Hello, list.

By way of introduction: I'm working with a linguist whose research
question involves the reconstruction of Proto-Afro-Asiatic.  The
specific project is a morphological database that involves 70-odd
languages of the Cushitic-Omotic family (geographically in the region
around Somalia).  This involves thousands of words, each of which
might have a dozen or more properties (e.g. gender="masc").  My guess
is we might end up with 100s of thousands of RDF nodes, maybe even a
few million.

RDF is great for this for reasons which I assume are pretty obvious to
this list.  So I looked around at RDF implementations and Mulgara
looks like the best combination of simplicity and power.  Near-zero
admin, simple start-up and shut-down, support for embedding, all very
attractive, since we may end up distributing a stand-alone
application.  The basic application would be a GUI that allows the
linguist to drag-n-drop elements of paradigms drawn from the data,
which is implemented as a web page that sends queries to a SPARQL
service point.

So I have a few general questions.  Most obviously, do I have the
right impression?  Which is, it'll be pretty simple to package up
Mulgara with some data as a domain-specific application, so the user
can just click on some kind of install program, and then fire up the
browser to work with the data.  A web admin interface, that would
allow the user to e.g. ask for statistics about the database, shut it
down, restart it, etc. - I assume that would not be especially
complicated.  Am I deceived?

Regarding rules/inferencing/etc.  I'm pretty familiar with RDF/OWL -
read all the docs, multiple times, worked throught the formal
semantics, etc. - but I have little practical experience with anything
beyond the data modeling stuff.  I'm not entirely at sea when it comes
to the inferencing stuff, having read up on Descriptive Logics, etc.,
but there's a pretty big gap (for me, at least) between theory and
application.  To be more specific:  I see Mulgara has a bunch of rule
stuff, some kind of rule engine, etc., but I don't know how to place
that.  Is it fish or fowl?  An RDF database, or a rule engine, or
both, or?  Is the rule engine a separate component?  Is it like FaCT++
or Jess or Pellet, or some other beast altogether?  For my application
we wouldn't need extensive inferencing, at least at first, but from
reading what little doco there is on Krule I can see how we might use
it, if for nothing else than data validation.

The problem of course is documentation.  I completely understand, used
to be a developer myself, so no complaints about developers not
writing documentation here.  But it's pretty close to a show-stopper.
And bad (inaccurate) documentation is even worse than no
documentation.  On the other hand, this is all pretty cutting edge,
and in fact it's danged hard to find a decent tutorial anywhere on the
web that makes the whole ontology/reasoning space reasonably clear, so
I'm not picking on Mulgara.

So here's where I'm at now: in spite of the lack of good
documentation, Mulgara seems like the right way to go.  The mailing
list is responsive and friendly, and it's being used by some projects
I really like (I probably would not even have considered it if Fedora
had not chosen it), and it has some excellent features.

The documentation problem, well, since it looks like I'm going to have
to just deal with it, I might as well make a virtue out of it.  So I
have a proposition:  declare a "month of clarity", starting Sept 1, in
which documentation will be the number one goal for /everybody/.  Or
"two weeks of non-obscurity", or even "a day of drudgery".  The thing
is, I'm willing to do a fair amount of technical writing as I explore
Mulgara, but before I commit to that I'd like to know that the
developers understand that documentation is not optional.  If I'm
going to spend a bunch of time (well, ok, let's say 4-5 hours a week
for the next month or so), (I just know I'm going to regret saying
that!), I want to know ahead of time that somebody who Knows What's
Going on is going to at least read the stuff.  Or, if any of the
developers wants to do an unstructured brain dump, I can work said
dump into something structured and readable.

Look at the big picture: it's not /that/ much work to get reasonable
documentation up and runnning, and (from my honest perspective) almost
anything would be better than what is now available.  Mulgara is just
too cool not to have at least decent docs.

Please don't misunderstand, this is not a "write documentation for me
or I'll use another product, and then you'll be sorry" threat.  Oh no.
 That would be too easy.  I'll tell everybody Mulgara is implemented
in COBOL, then you will indeed be sorry.  Actually, if people are just
too busy (it happens), I would probably just dump notes on a third
party wiki page rather than clutter the Mulgara wiki with speculative
stuff.  As a matter of fact, maybe that would be a Good Thing for the
Week of Words: start up a wikipage (I like wikidot.com) and just dump
stuff there, and then use that as source material to produce
well-organized, edited stuff on the Mulgara site.  Believe it or not,
I actually enjoy that sort of thing; I've always found technical
writing at least as challenging and rewarding as coding.

But from the response I've received so far, it seems people are
willing to take a little bit of time, at least in the informal medium
of a mailing list.  I think the documentation problem could be solved
in very short order; all it takes is a definite timeframe, the freedom
for developers to just vent without worrying about
punctuation/composition/etc, and a few volunteers to shape the stuff.
Oct 1 rolls around, Mulgara has the best documentation on the web.
What could possibly go wrong?

-gregg