December 15, 2006
New Java language features or a new language for multicore needed?
Henrik over at BEA decided to test GC with JRockit on various multi-core chips. The results are very good. Clearly, they are working hard at parallelizing GC as are the other JVM vendors I'm sure. You can read his blog here:
On the flip side, he points out that the single thread performance of the T1 is quite poor relative to the Intel chips, more cores means less single thread performance, at least for now. Sun did a post recently showing they were going to start focusing on the single thread performance which is a good thing also. We need balanced processors, the right number of cores, the right clock, the right amount of cache and bus bandwidth. Basically what Henrik shows is that a smart programmer can write highly multi-threaded code and it will run on a multicore processor very well. This shouldn't surprise anyone. The problem now is how to make everybody elses code run quickly without having to hire rocket scientists like Henrik to do it. Henrik says Java has good multi-threaded capabilities, it does but I don't think they make parallel programming any easier for the masses. Doug Leas stuff and his JSR add more library features for various mechanisms useful for multi-threaded coding but they still require a skilled developer to use them.
So, it looks like GC can be made or is right now scaling very well on multi-core type processors. The main issue remaining now is how applications will perform. Maybe, people won't need super quick response times and so will be happy with the single threaded programming models available today such as J2EE which are easy to program. Single thread in this context means how many threads handle a single request. 99.9% of JavaEE apps use a single thread. Multi-threading requests are possible using APIs like commonj, AsynchBeans or the new JSRs whenever they make it out the door.
It'll be interesting to see if developers will be forced to write multi-threaded code using traditional mechanisms, locks, synchronized blocks etc or if a new scripting language comes along that works with Java and simplifies writing multi-threaded code.
There are many functional languages with referential transparency that can be used to write code that is inherently parallel without the programmer being forced to handle synchronization themselves. You code the algorithm, let the compiler figure out how to multi-thread it. Such languages usually ban any modification of variables, you assign a value to a variable and then you can't change it, every variable is basically const or final. Recursion is usually used to create new variables with derived values. Sounds slow but you'd be surprised at what's possible compiler wise on this code. I wrote a HOPE interpreter in Prolog in the late 80s when I was in college. It was a good experience and really opened my eyes to the power of functional programming. There were some manic compilers available for other functional languages such as Chalmers ML which had 26 passes, I remember. There was hope that functional programming would replace Fortran as the language of choice on supercomputers where parallelism was key to performance. Maybe, it's time for those languages to return. They would certainly allow the power of multi-core to be exploited to processing single requests at a more consumable level than can be done with current Java features.
I guess a problem here is that I don't think todays programmers are educated to write code that way. If the idea of constant variables and recursion sounds ridiculous to you then you're not alone. It takes a shift in thinking to appreciate this as a programming style. The same applies to writing Prolog or logic based programming. Most people only know a single style, imperative programming today. This may slow the adoption of these kinds of approaches while developers figure out that you can actually write code that way.
Maybe, we'll see such language features added to Java in some way although a script engine add on looks more likely. I'm actually a big fan of functional languages especially ML, Haskell or HOPE. They allow complex ideas to be expressed very clearly. If the JIT or script guys can do the work then they may be where Java programmer move next. I don't see a need to further complicate Java by adding these features, a clean integration of a functional language with a parallelizing runtime would work fine so long as it integrated tightly with a Java VM and container. These kinds of programming languages should allow average developers to write highly parallel code without the current headaches.
December 15, 2006 | Permalink
If you'd like to see a Javaesque language which has very tight integration with Java, but lots of functional features already, have a look at Scala.
It doesn't go quite as far as making things referentially transparent, but it is a good example of how paying more attention to programming language research can help a language.
I'd say it would be a much better starting point than Java itself if one were to try to add functional language extensions for the purposes of automatic parallelisation. It already has algebraic datatypes with pattern matching in the form of case classes, which would be one big step towards being able to construct and deconstruct complex structures without needing mutation.
Posted by: Cale Gibbard | Dec 15, 2006 1:17:03 PM
I would recommend a good look at erlang (http://www.erlang.org) which is used in high availability telecom systems, and at the Mozart distributed system (www.mozart-oz.org) which is mainly academic but is exploring how distributed programming can be made simpler to code (in Mozart's case the answer is data-flow programming).
Both offer interesting viewpoints on the construction of distributed systems, and both languages have a solid functional programming background.
Posted by: Rodrigo | Dec 18, 2006 5:49:18 AM
Haskell is horrible in some respects (indentation is used for program logic rather than braces or other syntactic structure for example) but otherwise it's great. Infact there's some really neat work being done by a couple of guys in Cambridge for Microsoft on composable memory transactions using it. Lots of research goingon in this area and I think a lot of people are having the same ideas.
I work in Business Intelligence so the sort ofclassic queries we run use GROUP BY and lots of complex analytical functions. Doing that sort of thing above the DB layer in the mid-tier is not feasible with single-threaded model. Yet with real-time BI coming on, I see the need to have classic ACID transactional semantics in BI soon. Hence my early interest a while ago in CommonJ's API and how to pass the same transaction context around.
Lastly, if you look at Google's MapReduce you'll see that it ties in with this installment and you're previous one on reducing inlining etc. They invented an interpreted language to run on 1000's of PCs in parallel to get these sorts of aggregations on their data. It works QUICKER than a compiled language AND has security built into it far better than the API's in C++.
Posted by: Michael Bienstein | Dec 18, 2006 9:26:00 AM
It may not be for everyone, but Java developers out there that are having to build high-performance data processing or data analysis Java apps should look at http://www.pervasivedatarush.com/
It's a Java framework for data-intensive applications (not transactional like J2EE, but rather bulk data management) that handles horizontal, vertical and pipeline parallelism on multicore platforms. You don't have to use any Java NIO or concurrency API's -- it does all that for you. You don't need to code deadlock detection and shared data management routines either. In fact, some data processing apps can be created by simply using its XML scripting language by using out-of-the-box Java classes called "operators".
There's a free download of the framework at the website. I look forward to comments on this blog.
Posted by: Emilio Bernabei | Jan 8, 2007 1:30:34 PM