[Mulgara-dev] [Mulgara-general] Language discussion
Paul Gearon
gearon at ieee.org
Fri Aug 21 17:41:30 UTC 2009
Hi Ronald,
Thanks for this response. The first part of this email is discussing
language, and then I get into your specific response.
On Fri, Aug 21, 2009 at 6:30 AM, Life is hard, and then you
die<ronald at innovation.ch> wrote:
> [snip]
>
> I fully agree that Java can be overly verbose and cumbersome at times.
> There's work going on to reduce that, but as we've seen with closures,
> it's going to take quite some time.
It occurred to me to be patient and just wait for these features, and
given your comments below, then perhaps this is the best thing. After
all, the current codebase was started at the same time the Java 1.4
beta was released (we didn't have the budget to travel, but we were
accepted to present at JavaOne to demonstrate one of the first
commercial systems using "New IO") :-)
These days Mulgara needs Java 5 just to run, and it's much better to
run on Java 6.
>> == Comparison ==
> [snip]
>> The languages I've looked at are:
>> - JRuby
>> - Groovy
>> - Jython
>> - Scala
>
> One thing to make clear is: these are all their own languages. Just
> because they can run on the JVM doesn't mean that much -
Actually, I think it does, in that there is no overhead in talking to
a non-JVM system. For instance, I happen to like Erlang, but I didn't
even consider it because of the overhead in just talking to the Erlang
runtime.
> yes, it
> generally means you can use the Java libraries, which is great, but at
> the end of the day you're writing code in a different language with
> its own style, behaviour, quirks, etc.
That's kind of the point. :-)
>> There are others, but these ones all meet the popularity requirement,
>> and there is a limit to the number of new languages I can look at. :-)
>>
>> I came close to JRuby, especially given the optimizations and tight
>> JVM integration that the JRuby team were able to achieve when they
>> started working at Sun. However, the inconsistencies in the language
>> (lambda vs. Proc, begin/end vs. {}, etc), along with the lack of type
>> information made me uncomfortable with it for this application. Groovy
>> is a similar language, but with trivial Java integration. However, I
>> never felt it was as mature and efficient as JRuby has become (Ronald
>> may disagree here), and also has similar type issues.
>
> I have various issues with Ruby, mainly because they have multiple
> confusing ways to do things (and yes, the lambda/Proc mess is the
> prime example). It does have some nice things, though, for example
> the classes are instances allows you to do some neat stuff.
Indeed. It's for this reason that I'm thinking of how one could
provide a JRuby module for Topaz. It would be nice if unknown
predicates could be attached to classes, for instance. But that's a
topic for another mailing list.
> So I'm not
> going to say Ruby is crap, but personally I'm not all that thrilled
> about it, especially when comparing to Groovy.
>
> Regarding performance, it's not that great, and certainly slower than
> Groovy (JRuby is faster than MRI, but that's not fast...). (as always
> when talking about speed, I'm completely ignoring startup times and
> talking only about speed once hotspot has got its teeth into things a
> bit). The JRuby guys have done some fantastic work, but trying to
> implement certain of Ruby's quirks on the JVM is really difficult.
>
> I like Groovy reasonably well, mainly because it looks a lot like Java
> and has a very natural integration with it. Being a dynamically typed
> language means it has the known advantages and disadvantages of those
> as compared to statically typed languages. One thing I've noticed when
> writing Groovy code is that it takes me about the same length of time
> to write some given functionality in Groovy as it does to write it in
> Java - I spend more time _writing_ code in Java, but less time
> debugging it. However, the resulting Groovy code is usually more
> compact than the equivalent Java code. This is especially the case
> when start doing things like creating DSL's.
This is similar to my own experience with dynamic typing. It's faster
to write, but can lead to new bugs. It can also be harder to read.
I've come across a similar situation when I've tried to read Java code
that uses Map without parameters. I've often found myself in code in
Mulgara where I couldn't figure out exactly what was supposed to be
inside a map (it gets worse when people store more than one type).
When this applies to everything (and not just collections), then it
gets worse.
>> I relied more on reviews when it came to Jython. It has been around
>> the longest, but also the language I've spent the least time with. The
>> main impression I get from the community is that this is a good system
>> for doing transitional work, but ultimately you want to commit to
>> either Java or Python.
>
> I've written only a little Python, and don't particularly care for it.
> Some people love it, some don't.
>
>> But in the end, the language I've decided I want to go with is Scala.
> [snip]
>
> Haven't used it (yet); the syntax is annoying. But otherwise it looks
> really exiting.
I hope you mean "exciting". :-)
The syntax got me at first as well, but as I've read more, it turns
out that everything is carefully thought out. I'm willing to forgive a
lot when I know there was good reason for it.
> Scala is really a different beast from the other three, because it's
> the only one that's statically typed. This means you can
> (theoretically at least) run things like findbugs on it. Personally
> I'm a huge fan of static typing, even though it's sometimes more
> cumbersome and verbose (though Scala goes a long way towards fixing
> the latter).
I couldn't agree more about the benefits of static typing. But you're
right about it usually being verbose. My other complaint is that it
usually requires redundancy (Java in particular). One of the reasons I
like Scala is that it addresses both of these issues.
>> == Comments? ==
>>
>> OK, so now you've seen my rationale, I'm interested in soliciting
>> other opinions.
>>
>> Do people think it's a good idea to introduce a new language? A bad
>> idea? Are you ambivalent? Remember, a new language will be operating
>> alongside Java, and not replacing anything. Also note that all the
>> systems I've looked at will introduce new jars to the lib directory.
>> (Scala would introduce 10MB of jars)
>
> In general I'm extremely wary of using multiple languages in a
> project, to the point of usually saying no way. The main issues are
> threefold:
> 1) double/triple/etc the tooling, i.e. you need a complete set of
> tooling for each language
> 2) additional barrier to entry and learning curve for anybody getting
> involved in the project
> 3) additional points for errors at the interfaces/interactions
> between the languages
Some very good points, and you have me re-evaluating my reasons. I'll
get into that below...
> Re 1: this involves not just compiler and build tool, but also
> debugging, coverage analyzers (cobertura, clover, etc), code analyzers
> (findbugs, pmd, etc), and so forth. While some of these operate at the
> JVM level, they still need to tie the results back to the source code,
> and hence are really language specific. In Mulgara's case there aren't
> many tools being used, but that's a deficiency that Mulgara should fix
> rather than using it as an excuse here to say this isn't an issue.
While I agree with the principles here, it's just not something I'm in
a position to address. There aren't a lot of active developers at the
moment, and for those of us who do spend time on it there is no
mandate or opportunity to work with any of these things. If it were a
project that were being run commercially then that would change
things, but that's not the situation anymore.
In our defense, there aren't many projects that go all the way into
using tooling, except on an ad hoc basis. I might be nice if we were
using it ourselves, but I just don't see it happening. As a result, I
don't want to avoid doing something because it would make something
hard that we'll never do anyway (even if we should).
> Re 2: this is also really, really important. If non-trivial parts of
> the codebase are written in multiple languages, that really means
> anybody wanting to make noticeable changes is likely to have to know
> all involved languages so they can make the changes everywhere. But
> even to just understand the code you're now requiring more from every
> developer.
This is one of the best arguments against, in my opinion. Mulgara
could really use some new developers, and if the core of the system
were to get more complex in this way, then it could really be an
impediment. This point alone has me thinking that perhaps I should
stay my hand (sigh).
> Now, of course I think every developer should know and be reasonably
> proficient/comfortable in multiple languages; and also that whenever
> you start a project you pick the language that is best suited for that
> project. But mixing different languages within a project is dangerous.
Perhaps this would be an easier argument for me to make if Scala had
more time to get more popular. A good portion of the Java developers I
respect have been spending time with this language, and are raving
about how good it is. That's what made me consider it in the first
place. So perhaps it will penetrate the general Java-programming
community in a top-down fashion. But that wouldn't happen for a while,
if it ever does.
Moving to a version of Java with closures, etc, will almost be like
moving to a new language anyway, but since it will still have the name
"Java" and it has an upgrade path, then there will be a lot more
acceptance of it.
> There are some noticeable exceptions to this, however. One is adding
> support for your library/server to be used in other projects that are
> written in different languages. So for example writing a ruby library
> to make it easier to talk to Mulgara would be a candidate. These
> things have the quality that they sit on top of, and can be freely
> decoupled and removed from, the rest of the project. You can therefore
> afford to both potentially skimp on tooling as well as live with the
> fact that somebody may join and end up making significant changes
> without needing to understand these additional languages.
I've thought about this for a few languages, but the HTTP interfaces
are starting to make it unnecessary. So I guess this doesn't apply
here. That reminds me that I need to work on making these interfaces
more complete.
> Another exception is supporting writing of "modules" or other
> extension points in different languages. Good examples here are
> scripting capabilities, i.e. where you allow your app to be extended
> via "scripts" (and Java's scripting API is great for this). Again,
> this has the quality of being easily separable without affecting any
> of the main code; but really, this doesn't usually mean you write any
> code in another language at all, just that users of your code can do
> so. (so I guess this isn't really relevant to the discussion here).
>
> Having said all this, we used Groovy in addition to Java in the Topaz
> project, i.e. ended up with two languages. However, it's important to
> note that the only things we used Groovy for were A) writing tests
> (this is the largest usage), B) some command-line tools ("scripts"),
> C) a small library (a Builder) on top of Topaz to support apps written
> in Groovy. Coupled with the fact that there is pretty good
> integration and cross-compilation between Groovy and Java, this
> severely mitigated point 1 above (since you don't usually run the
> various analyzers on your test code or simple scripts). Point 2) is
> somewhat mitigated by the fact that Groovy is very similar to Java,
> and hence can be picked up more easily by folks familiar with Java;
> but this is probably the major flaw in the decision to use Groovy
> here.
In fact, this is the capacity that I was thinking of introducing a new
language anyway. But once I got over the psychological hump of
introducing the other language, then I started seeing all the other
places in the main source tree where it would make life easier. :-)
For instance, I need to parse the SPARQL test suite (which is
described as a series of interlinked RDF files), and then run it
against our system. This requires parsing of the XML responses, etc.
While I've written some of it in Java already, I'd rather do it in a
friendlier language.
> In summary, while I'm not saying Java is perfect or even better than
> any of the proposed languages, I don't think it's a good idea to do
> what you're proposing, i.e. use a second language to implement core
> parts of Mulgara. If we were talking about starting a new project,
> that would be entirely different issue, and I would quite likely
> suggest using Scala or something, but not for an existing project.
While I'd *love* to start from scratch (for numerous reasons), the
reimplementation of the query engine, the file-level utilities, etc,
are just too daunting for only a few people to work on.
I'll confess that part of my motivation is because I'm sick of Java.
Yeah, that's not a good reason, but this system is a big part of my
professional life at the moment! :-)
I've been working on this source code now for 9 years, and I think
it's grown beyond the language it was originally written in. I've had
some breaks (traffic flow control, encryption, medical PDA software, a
root-kit for Windows [for legitimate purposes.... honest!], UML/OCL
modeling frameworks, etc), but I keep coming back to Mulgara. And
while I'm still interested in what I'm accomplishing, and the new
things I'm creating with it, I'm getting sick of the tedious minutiae
that the Java part of the system imposes on me. Every time I do
something outside of Mulgara I work with another language (usually in
the JVM, but not always), and when I come back to Mulgara I wince at
having to use Java again.
So while I think all the reasons I've given a good ones, I guess I
have to admit that they're justifications.
>> If you agree that a new language should be added (or you don't mind if
>> one is), then which language do you want to see? What features of your
>> preferred language do you think are compelling?
>>
>> My own preference is for Scala (as explained above). Have I missed
>> anything about this system that you think I should take into account?
>> Do you have criticisms about Scala in general? (or any other language)
>
> If you do end up adding another language, then I'd definitely rule out
> both Ruby and Python, not because they're "bad" or inferior languages
> languages or anything, but because they're very different from Java
> and they're dynamically typed. Groovy is dynamically typed too, and
> hence I'm also wary of using it especially in something as critical as
> a database (did I mention I'm a big fan of static typing :-) ); its
> main redeeming property here is that it is so very similar to Java and
> hence easier for folks to pick up. Scala, being statically typed, I
> think would be the best choice from a resulting code quality
> perspective. But it's not well known (yet), and hence at this point
> you will likely be noticeably impeding anybody wanting to work on
> Mulgara (including existing folks - for example, while I'd love learn
> Scala, I don't have the spare time right now, so anything that gets
> written in Scala will just raise the barrier that much higher for me
> to do anything on Mulgara).
I think you made good points with everything you said here. I think
you've convinced me to keep the main system 100% Java, at least for
now. If I can hold off long enough, then maybe the language will start
to evolve into something more usable.
On the other hand, I still think that extras like test code, and
possibly even client libs could be written in a new language, much as
Topaz did when it introduced Groovy. So perhaps I can introduce Scala
on the periphery of the project. Even if I wanted to introduce Scala
into a more central part of Mulgara, then it really shouldn't happen
before this step anyway. (I'm not saying I still want to do that, I'm
just thinking out loud about it)
Does this sound better to you?
Paul
More information about the Mulgara-dev
mailing list