[Mulgara-general] [Mulgara-dev] Language discussion

Fri Aug 21 11:30:04 UTC 2009

Various comments inline - skip half-way down to get to the meat of my
comments regarding the core question.

On Thu, Aug 20, 2009 at 08:42:06AM -0500, Paul Gearon wrote:
> I'm interested in adding a new language to Mulgara, and I'd like to
> solicit some feedback please. I don't want to replace anything here. I
> just want to add new option to the system.
> 
> First, the rationale for this.....
> 
> == Rationale ==
[snip]

I fully agree that Java can be overly verbose and cumbersome at times.
There's work going on to reduce that, but as we've seen with closures,
it's going to take quite some time.

> == Comparison ==
[snip]
> The languages I've looked at are:
> - JRuby
> - Groovy
> - Jython
> - Scala

One thing to make clear is: these are all their own languages. Just
because they can run on the JVM doesn't mean that much - yes, it
generally means you can use the Java libraries, which is great, but at
the end of the day you're writing code in a different language with
its own style, behaviour, quirks, etc.

> There are others, but these ones all meet the popularity requirement,
> and there is a limit to the number of new languages I can look at. :-)
> 
> I came close to JRuby, especially given the optimizations and tight
> JVM integration that the JRuby team were able to achieve when they
> started working at Sun. However, the inconsistencies in the language
> (lambda vs. Proc, begin/end vs. {}, etc), along with the lack of type
> information made me uncomfortable with it for this application. Groovy
> is a similar language, but with trivial Java integration. However, I
> never felt it was as mature and efficient as JRuby has become (Ronald
> may disagree here), and also has similar type issues.

I have various issues with Ruby, mainly because they have multiple
confusing ways to do things (and yes, the lambda/Proc mess is the
prime example). It does have some nice things, though, for example
the classes are instances allows you to do some neat stuff. So I'm not
going to say Ruby is crap, but personally I'm not all that thrilled
about it, especially when comparing to Groovy.

Regarding performance, it's not that great, and certainly slower than
Groovy (JRuby is faster than MRI, but that's not fast...). (as always
when talking about speed, I'm completely ignoring startup times and
talking only about speed once hotspot has got its teeth into things a
bit). The JRuby guys have done some fantastic work, but trying to
implement certain of Ruby's quirks on the JVM is really difficult.

I like Groovy reasonably well, mainly because it looks a lot like Java
and has a very natural integration with it. Being a dynamically typed
language means it has the known advantages and disadvantages of those
as compared to statically typed languages. One thing I've noticed when
writing Groovy code is that it takes me about the same length of time
to write some given functionality in Groovy as it does to write it in
Java - I spend more time _writing_ code in Java, but less time
debugging it. However, the resulting Groovy code is usually more
compact than the equivalent Java code. This is especially the case
when start doing things like creating DSL's.

> I relied more on reviews when it came to Jython. It has been around
> the longest, but also the language I've spent the least time with. The
> main impression I get from the community is that this is a good system
> for doing transitional work, but ultimately you want to commit to
> either Java or Python.

I've written only a little Python, and don't particularly care for it.
Some people love it, some don't.

> But in the end, the language I've decided I want to go with is Scala.
[snip]

Haven't used it (yet); the syntax is annoying. But otherwise it looks
really exiting.

Scala is really a different beast from the other three, because it's
the only one that's statically typed. This means you can
(theoretically at least) run things like findbugs on it. Personally
I'm a huge fan of static typing, even though it's sometimes more
cumbersome and verbose (though Scala goes a long way towards fixing
the latter).

> == Comments? ==
> 
> OK, so now you've seen my rationale, I'm interested in soliciting
> other opinions.
> 
> Do people think it's a good idea to introduce a new language? A bad
> idea? Are you ambivalent? Remember, a new language will be operating
> alongside Java, and not replacing anything. Also note that all the
> systems I've looked at will introduce new jars to the lib directory.
> (Scala would introduce 10MB of jars)

In general I'm extremely wary of using multiple languages in a
project, to the point of usually saying no way. The main issues are
threefold:
 1) double/triple/etc the tooling, i.e. you need a complete set of
    tooling for each language
 2) additional barrier to entry and learning curve for anybody getting
    involved in the project
 3) additional points for errors at the interfaces/interactions
    between the languages

Re 1: this involves not just compiler and build tool, but also
debugging, coverage analyzers (cobertura, clover, etc), code analyzers
(findbugs, pmd, etc), and so forth. While some of these operate at the
JVM level, they still need to tie the results back to the source code,
and hence are really language specific. In Mulgara's case there aren't
many tools being used, but that's a deficiency that Mulgara should fix
rather than using it as an excuse here to say this isn't an issue.

Re 2: this is also really, really important. If non-trivial parts of
the codebase are written in multiple languages, that really means
anybody wanting to make noticeable changes is likely to have to know
all involved languages so they can make the changes everywhere. But
even to just understand the code you're now requiring more from every
developer.

Now, of course I think every developer should know and be reasonably
proficient/comfortable in multiple languages; and also that whenever
you start a project you pick the language that is best suited for that
project. But mixing different languages within a project is dangerous.
There are some noticeable exceptions to this, however. One is adding
support for your library/server to be used in other projects that are
written in different languages. So for example writing a ruby library
to make it easier to talk to Mulgara would be a candidate. These
things have the quality that they sit on top of, and can be freely
decoupled and removed from, the rest of the project. You can therefore
afford to both potentially skimp on tooling as well as live with the
fact that somebody may join and end up making significant changes
without needing to understand these additional languages.

Another exception is supporting writing of "modules" or other
extension points in different languages. Good examples here are
scripting capabilities, i.e. where you allow your app to be extended
via "scripts" (and Java's scripting API is great for this). Again,
this has the quality of being easily separable without affecting any
of the main code; but really, this doesn't usually mean you write any
code in another language at all, just that users of your code can do
so. (so I guess this isn't really relevant to the discussion here).

Having said all this, we used Groovy in addition to Java in the Topaz
project, i.e. ended up with two languages. However, it's important to
note that the only things we used Groovy for were A) writing tests
(this is the largest usage), B) some command-line tools ("scripts"),
C) a small library (a Builder) on top of Topaz to support apps written
in Groovy. Coupled with the fact that there is pretty good
integration and cross-compilation between Groovy and Java, this
severely mitigated point 1 above (since you don't usually run the
various analyzers on your test code or simple scripts). Point 2) is
somewhat mitigated by the fact that Groovy is very similar to Java,
and hence can be picked up more easily by folks familiar with Java;
but this is probably the major flaw in the decision to use Groovy
here.

In summary, while I'm not saying Java is perfect or even better than
any of the proposed languages, I don't think it's a good idea to do
what you're proposing, i.e. use a second language to implement core
parts of Mulgara. If we were talking about starting a new project,
that would be entirely different issue, and I would quite likely
suggest using Scala or something, but not for an existing project.

> If you agree that a new language should be added (or you don't mind if
> one is), then which language do you want to see? What features of your
> preferred language do you think are compelling?
> 
> My own preference is for Scala (as explained above). Have I missed
> anything about this system that you think I should take into account?
> Do you have criticisms about Scala in general? (or any other language)

If you do end up adding another language, then I'd definitely rule out
both Ruby and Python, not because they're "bad" or inferior languages
languages or anything, but because they're very different from Java
and they're dynamically typed. Groovy is dynamically typed too, and
hence I'm also wary of using it especially in something as critical as
a database (did I mention I'm a big fan of static typing :-) ); its
main redeeming property here is that it is so very similar to Java and
hence easier for folks to pick up. Scala, being statically typed, I
think would be the best choice from a resulting code quality
perspective. But it's not well known (yet), and hence at this point
you will likely be noticeably impeding anybody wanting to work on
Mulgara (including existing folks - for example, while I'd love learn
Scala, I don't have the spare time right now, so anything that gets
written in Scala will just raise the barrier that much higher for me
to do anything on Mulgara).

  Cheers,

  Ronald