« Battle of the mash ups, Adsense versus Yahoo | Main | Cache size and multi-core »

November 21, 2006

Multi-core may be bad for Java

The trend towards multicore is moving along at a fast pace. Architectures like Suns Niagara seem to be getting copied by the other CPU vendors. The architecture is basically lots of cores but low clock speed per core.

This is a problem for Java as:

  • Java likely has a longer path length than languages like C and clock speeds won't help with this.
  • JavaEE promotes a simple threading model for applications
  • Garbage collection remains heavily dependant on clock speed for small pauses.

The issue of path length and running in a managed environment means that code will take longer to execute. Yes, there are multiple cores but from the point of view of an individual request the lower clock speed is going to hurt response times. All those cores will be pulling from a single cache also so the cache better have a high hit rate or it'll be even worse.

Applications may have to be written to exploit threading if they want very high performance. Developers will be put back 5 years in terms of clock speeds (1Ghz sound good to anyone?) so it may be necessary to heavily multi-thread code to make up for this in applications. This means applications may be forced to use APIs like commonj for threading to wring this performance out of the boxes now if response time is important. This may actually push people away from JavaEE as once you start down this slope then cutting out 'fluff', i.e. managed access to resources etc, will mean less path length and this will make non JavaEE or lighterweight containers more attractive.

The trend then includes some of the following choices:

  • Use containers that can be made lightweight or just include exactly what you need, no more no less.
  • JavaEE will be forced to evolve towards a scalable implementation, i.e. choose what you want rather than be forced to swallow it all whole and pay the path length price.
  • Simpler containers with less services but enough for 90% of applications.
  • JavaEE evolves to support common threading patterns so that it's easier for normal developers to leverage threading on these slower processors.
  • JIT become cleverer in order to compensate for the slower processors.
  • Garbage collection gets augmented with malloc/free type operations.

The garbage collection point interests me as even with Java 5, large heaps containing things like big caches still kill the performance with very long pauses. Cache managers really need a malloc/free type API to directly control the life cycle of objects in these caches as it's totally ridiculous to use GC to manage these heaps.

It's hard to know what will happen but I know the days of clock speed saving people from path length concerns look to be over. Clock speeds will start to regress to allow chip makers to lower power requirements while adding more and more cores to the processors and this is going to have a negative effect on Java unless some issues are resolved soon and not just on the latest JVMs. Many customers will be deploying existing JVMs on multi-core systems and will hit these issues.

Other posts: Tuning for multicore, the basics

November 21, 2006 | Permalink

Comments

Hi Billy,
Great blog - I follow your blog regularly and find it informative.
A quick question on the multi core thing: Don't you think it would boost the app server performance - which is inherently multi threaded: app server services plus user threads.

Kindly reply.
thanks,
Sethu

Posted by: Sethu | Nov 21, 2006 10:25:31 PM

No
My point is that Java is very dependant on fast GC for good performance. GC has elements that are multi-threaded but the main task remains single threaded and this thread will be slower on a multi-core CPU and get slower as clock speeds drop as more cores are added. The other issue is that more cores do help multi-threaded code but only at the limit. If your JVMs don't run at high load and lets face it, most don't then it's not the multi-thread thing that gives performance, it's the speed of a single pipe and thats dropping.

Posted by: Billy | Nov 21, 2006 10:44:11 PM

Hey Bill,

Is you experience here related with IBM or Sun GC. I ask because there are some not so insignificant differences between the memory models used by Sun and IBM. My experience with the Sun JVM is that GC speed has increased considerably due to multicore technologies.

Also one other point, my experience is that hyper-threaded cores do better than multi-core machines though these results are very use case sensitive. Leads me to believe that things are much more memory latency dependent than pure CPU.

nice blog though.

Cheeers,
Kirk

Posted by: kirk | Nov 22, 2006 12:53:27 AM

Hi,
I do not fully agree with your statements.

1. There are several options at least in the SUN VM that allows the GC to be run on more than one processor.

2. app servers are multithreaded, and therefore should scale on a multicore machine

3. Typically the CPU time spend by the GC is around 5%. If it's above than your application will not scale anyway. So even if this time would go up on a multicore to 10% you would only loose 5% performance

4. The problems with multicores are problems for all programming languages that have a GC. Almost all app servers are based on programming languages that have a GC

I agree that for implementing caches the JVM might not have all the features it should have. Being able to give the GC a hint that objects should be GC'ed now, could be of value.
Regards,
Markus

Posted by: Markus Kohler | Nov 22, 2006 3:55:55 AM

Billy,

For high performance servers, can't there be a co-processor to help with fast GC ?

Second, a lot of collected objects are visible only to single threads for their entire lifetime - in such situations multi-cores should certainly speed up the process.

What do you think.

Thanks
Cal

Posted by: BrianCal | Nov 22, 2006 4:04:32 AM

To mitigate slow multithreaded GC, why not just run multiple JVM's?

I often horizontally scale WAS on large windows boxes.

Posted by: jonathan | Nov 22, 2006 4:08:51 AM

Hi Billy,

It's all the way round!!

- "JavaEE promotes a simple threading model for applications"

Each EJB method invocation may run in its own thread (or in its own server if clustered). Each servlet invocation may run in its own thread (or in its own server if clustered). What other threading mechanism would you suggest? That's scalable, isn't it?

- The architecture is basically lots of cores but low clock speed per core.

Why low speed per core? If a run an Intel Pentium Dual Core, am I getting lower clock speeds? I don't think so...

For T1 processors all cores run at around 1GHz (either in the 4/6/8 core models, http://www.sun.com/servers/family-comp.html#coolthreads). That's a pretty good speed for a sparc processor...

- "Applications may have to be written to exploit threading if they want very high performance."

Well, of course. That's what everybody is saying since a long time (including C++ "Guru" Herb Sutter http://www.gotw.ca/publications/concurrency-ddj.htm).


- "JavaEE evolves to support common threading patterns so that it's easier for normal developers to leverage threading on these slower processors."

JavaEE containers are responsible for the lifecycle of managed objects. The Spring Framework is also responsible for bean lifecycle. How are you expected people to leverage threading? It's frameworks that should handle threading, right?

- "Garbage collection remains heavily dependant on clock speed for small pauses."

Why? Garbage collection may be run without pauses (using a concurrent GC). Sun's VM uses parallel GC (so as to reduce pause times in multiprocessors). I'd say that space sizes and correct tuning are more important to small pauses than CPU speed.

Well, Billy, I think you should consider rewriting your entry.

Cheers,
Antonio

Posted by: Antonio | Nov 22, 2006 7:04:22 AM

Antonio.
Concurrent GC is a great thing. The short term garbage is collected very efficiently. The problem is the long term garbage such as data kept in a cache in the JVM. If that data doesn't change then you could size the long term heap to hold it and not see a big GC event which is obviously cool. But, if the cache holds data that changes (i.e. a write through cache for example) then that large heap will fill and when it does there will be a lot of data in it and you'll take a big pause when that happens even with the latest JVMs. Caches are now adding features such as indexing and this, of course, means the cache takes more memory also.

GC algorithms all have an element of multi-threadedness about them but there are still portions that are single threaded.

Running multiple JVMs per box is the only way to scale GC right now as was suggested by a previous poster. Rather than run a single JVM with a 1.5GB heap, run 3 with 500MB heaps. Now, GC is fully multi-threaded but it takes multiple JVMs to do this.

The problem with that is that the second level caches get polluted by having 3 active large processes running on the box. This splits the CPU cache 3 ways. There are projects to 'share' the compiled bytecode on the machine which would help with this.

The other issue is you have more threads running now than before. Each JVM probadly has M + N threads in use where N is the container threads and M are constant ones in the JVM implementation and the JVM application server. 3 JVMs means 3M + 3N where as a single JVM is likely M + 3N.

Framework provided threading models so far have been simplistic. It thats what your application needs then cool and for most applications it is fine. But, there is a reason we added commonj and it's why this API gets a lot of use, people couldn't write efficient applications with that simplistic threading model. The thread model choice can be critical for the performance of some applications.

Markus
Thanks for the comments. Intel right now indeed have high clock speeds but I think thats temporary and we'll soon see the clock dropping as the number of cores go up. I always push the high clock as an advantage over the Sun CPUs but it may be just a temporary advantage.

Posted by: Billy | Nov 22, 2006 7:21:35 AM

I may be misinterpretting your opening premise "The architecture is basically lots of cores but low clock speed per core." Does Intel's recent Core 2 Duo or Core 2 Quad chips qualify? If so, then I disagree with your premise.

I just recently read some benchmarking data for Intel's Core 2 Duo. It only has two cores so it may not qualify as "lots of cores", but the performance per core is very good. See the benchmarks at http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2795.

Posted by: Sam | Nov 22, 2006 11:57:16 AM

The dual cores had a clock speed reduction. The quad cores are now at 2.66Ghz. I expect this trend to continue in order to manage heat and power.

Posted by: Billy | Nov 22, 2006 2:21:27 PM

Saying things like "the dual cores had a clock speed reduction" without qualification is misleading. What matters is single-core/single-thread performance, and clock speed is only one variable in that.

Apologies for the long comment/explanation that follow...

The clock speed reduction was not a matter of going to dual core (*) but a matter of going to a new processor architecture... and as anyone in this industry ought to know, clock speeds are NOT terribly useful measurements for comparing processors of different architectures.

Intel's Core 2 architecture (**) architecture made its focus increasing instructions per cycle, rather than raw clock speed - which was the focus of Intel's earlier Netburst architecture (used in the P4 and it's relatives) did.

IGNORING the fact that they are dual core, either core of the top of the line 2.93ghz Core 2 Duo (or 3.0ghz "Woodcrest" Xeon) qualifies as the fastest single-core Intel x86(***) processor ever made. There is no sacrifice in the lower clock speed; you can argue exactly what multiple of megahertz is comparable for Core 2 vs. Pentium 4 comparisons, but a VERY conservative low end of the argument would be about 40%, which puts the top Core 2 chips well beyond the fastest Pentium 4 chips (the 3.93ghz Extreme Editions)(****).

As for the quad-core chips, like the first-generation Intel dual core chips (Pentium D 8xx series), they are actually two dual-core dies on a single package, not a unique quad-core design. The clock speed reduction is notable, but more a matter of cost and power consumption than of design capability. Given what overclockers are achieving with the Core 2 Duo chips today, it's fair to assume that the architecture has quite a ways to go before hitting its actual clock speed limits.

(* except for the very peak 3.8ghz and 3.93ghz chips, clock speeds between the Pentium 4 series and the P4/Netburst based dual core Pentium-D series very rapidly achieved parity. A similar pattern applies for AMD's chips.)

(** and to an extent the Pentium-M/Core mobile architectures that preceded it, even if Intel claims that they're not that closely related.)

(*** in the sense of x86 made by Intel; I leave the Intel vs. AMD arguments for others.)

(**** most of us have been using a 1.5:1 ratio for estimating K8 vs Netburst performance, and most benchmarks are showing that the Core 2 Duos beat out AMD K8 at the same clock speed, so "40%" is VERY conservative.)

Posted by: Nate Edel | Nov 22, 2006 5:58:31 PM

I recall my colleague telling me about a research paper by GC guru David Bacon which stated that under heavy load on a system with 8 or more CPUs (or cores) your enterprise Java app is going to wait for I/O completion often enough for the JVM to allocate one CPU exclusively for memory management tasks and use, guess what, reference counting for GC.

I could not find the paper, but you may have some slides if you
google for "Java without the Coffee Breaks".

Posted by: Dmitry Leskov | Nov 23, 2006 5:34:47 AM

Hi Billy,

But, if the cache holds data that changes (i.e. a write through cache for example) then that large heap will fill and when it does there will be a lot of data in it and you'll take a big pause when that happens even with the latest JVMs.

GC is a run to failure process. Failure for GC means the GC failed to collect the object in question. Consequently, caching will cause long GC pauses in old generation GC where as releasing objects will actually cause GC to run faster. I have some very nice GC pause time vs bytes collected that demonstrate this point very nicely.

Kind regards,
Kirk

Posted by: Kirk | Nov 23, 2006 7:54:58 AM

It's an interesting hypothesis, but there are several existence proofs that negate the it: several companies, including IBM using WebSphere, have produced excellent results on SPECjAppServer -- a Java EE benchmark -- on multi-core/multi-thread systems, including Sun's T2000 system which runs 32 threads on a single chip at only 1.2 ghz. Those results tend to have better cost/operation (both in purchase price as well as power consumption, etc.) than similar results that use lots of little boxes horizontally scaled.

Multi-core/multi-threaded systems may not be appropriate for all applications (and certainly not for apps where single-threaded performance is important). But Java isn't necessarily in that category (and Java EE specifically isn't): GC is highly multi-threaded, and (in most cases) GC simply isn't the performance bottleneck it once was.

Posted by: Scott | Nov 27, 2006 10:38:03 AM

Scott,
We have published good numbers on them and throughput wise the new chips are an improvement over the old ones. But, as you point out, single thread performance is compromised. GC wise, if the applications do not have large caches or similar things in the heap then you're right also but if app do have large heaps of cached objects then as multi-threaded as current GC is, there are still elements that are single threaded and those apps will suffer.

As an aside, I think the time has come when we need to bring back ways to allow application to manage the lifecycle of certain object graphs. This would solve the GC issue for almost all apps. Single thread throughput wise, nothing changes though.

Posted by: Billy | Nov 27, 2006 10:54:36 AM

Billy, I have thought that commonj is a very nice idea for a while now. It is really too bad it is not used more widely.

Otherwise, I think that the need for explicit deletion of objects (malloc and free) is a very good idea if kept out of the hands of the vast majority of Java programmers who would not call free(). I perfect example is the way Derby for example tries to avoid running out of heap and knowing when to fall to disk. It has to test the memory left after allocating different objects to discover their sizes. What I'd really like to see is a modifier on local variables and parameters to say that the object will never be assigned to a field and will always be stack-bound and can then be cleaned up when it goes out of scope. The other thing I'd like to see is a way to attach a hook to a ClassLoader so that an application can decide how much relative space to assign to different Classes of objects. Finer control is always better.

Posted by: Michael | Nov 27, 2006 4:07:22 PM

Michael, a number of modern JVMs do simple 'escape-analysis' on bytecode to determine whether an object is stack-bound without the need for modifiers.

IBM have also recently released a realtime version of WebSphere which uses a special JVM with a new GC technology called 'metronome'. This allows programmers to allocate from different sub-heaps depending on their time/space requirements.

The benefit of this approach is that you can cope with large malloc/free-style object caches without disturbing the pause time for realtime threads.

See the Metronome site at http://domino.research.ibm.com/comm/research_projects.nsf/pages/metronome.index.html for more details.

Posted by: Stuart | Nov 27, 2006 6:10:42 PM

There are a number of problems with your thinking, and the best way to empirically demonstrate that your conclusion is wrong is to note that J2EE apps seem to get a more significant boost than most other types of apps with Niagara server.

So let's go down the list:

1) As others have pointed out, multi-core designs don't necessarily mean measurably slower executions inside a core. You'd be hard pressed to find a top of the line multicore CPU that was more than about 10% slower than it's top of the line single core brethren when running single threaded apps. If you can speed up cross core memory access in exchange for the 10% CPU core penalty, you'll probably make JVM optimizers, and GC implementers in particular very happy.

2) Your notion that Java likely has a longer path length than languages like C doesn't make much sense at all. Java's execution model actually reduces the need for sequential execution as compared to C, and the nature of HotSpot's dynamic runtime actually allows Java programs to take shorter optimistic paths and then fall back to longer paths only when a pessimistic case comes out (hey, maybe you could execute both concurrently on different cores ;-). Of course you can do this in C with a lot of hard work, but in Java the runtime can do it for you more easily.

3) Your notion that a simple thread model (specifically J2EE) somehow executes poorly as you have more and more parallelism in hardware is actually backwards. Complex thread models tend to have significant problems scaling. Simple is a huge advantage.

4) Garbage collection does *not* remain heavily dependant on clock speed for small pauses. Concurrent GC actually relies heavily on having excess CPU cores available and extraordinarily low latency inter-core MESI type manipulations in order to have small pauses. Assuming you are doing incremental GC with a write barrier, your biggest concerns are a) being able to find a CPU to remove the write barrier before someone needs to write to an object and b) being able to update all the other cores about changes to the write barrier.

4) Multi-core doesn't mean shared cache or memory bandwidth. The Pentium D's had this feature, but the trend tends to be for each core to have independent cache and memory bandwidth.

5) Finally: JIT's, GC's, etc. actually benefit from more overall CPU throughput than from improvements in the single threaded execution. If you've got extra cores that an app otherwise isn't using, no harm in having the JIT do some extra profiling analysis or code generation, the GC prematurely reorganizing memory, etc. This is an advantage that you effectively don't need or take advantage of with more traditional execution models.

So, in fact even on a theory, Java is likely to benefit from future multicore designs more than languages with more traditional execution models. Of course, functional languages will probably receive the biggest boost of all.

Posted by: Christopher Smith | Nov 28, 2006 1:55:13 PM

I have a few objections to points you make

1) You say it is a J2EE problem that core speeds will be held down. But compared to other applications in C , J2EE applications are much more multithreaded and multi cores should help them much more than say single threaded first person shooters

2) A number of times you have mentioned that the second level cache gets polluted as it is share across cores. While this is true of Intel chips its not true for AMD Opteron processors where each core has its own L1 and L2.

Posted by: Prabuddha | Nov 28, 2006 2:25:16 PM

Prabuddha,
Every system is a balance. You are not getting those extra cores without a cost. There is only so much room on the die. AMD currently have cache per core but trendwise, I don't think that will continue. What may happen is that chips will have groups of cores that share a cache per group.

Posted by: Billy | Nov 28, 2006 2:50:43 PM

Chris,
1) Clock speed are coming down. The onus now is on performance per watt not performance per thread as has been the case until AMD kicked Intel around. We'll start to see chips with more and more cores and the clocks will drop to control heat. You can believe me or not but I don't think this is in dispute industry wide.

2) Very few people write very fast Java code, C people don't have the high level libraries in Java and the code tends to be tigher as a result. I agree that JITs have advantages over statically compiled code and can do things impossible with a conventional compiler given profiling info etc.

3) Simple threading if it works is best but not all apps work like that. We didn't make commonj for nothing, we did it because some applications needed very high performance and needs a thread model suited to the application to get this high level of performance. One size does not fit all. If performance

4) Concurrent GC is indeed a beautiful thing but if applications have large caches or utilize those 64 bit address spaces now with variable data when it too will run in to issues.

4a) Trust me, you are not going to see tight coupling between #cores and #caches. Thats not the direction. It's transistors for cores. If someone is building a 64 core system it won't have 64x the cache of a single core.

5) You're assuming only a single process on that set of cores then right? This isn't normal. This will cause issues with the cache and negate some of the gains you speak of.

The point of these articles isn't to present a dogmatic point of view. It's to air issues and promote debate and awareness and as such I welcome all of the comments :)

Posted by: Billy | Nov 28, 2006 2:59:10 PM

Very interesting blog.

I believe a few languages are ready to multi-core/multi-thread as Java, due his native support to multi threading.

I agree we have now a new scene, and we all must prepared to face it. I guess we'll need :

* Better profiller tools to debug and optimize multithreaded applications;
* Enhanced JVMs and GC algorithms to use CPU power spread across several cores and better use of L2 and L3 cache.
* And, perhaps, enhancements to Thread API and Concurrency Utilities package, as well good docs and examples.

A final point: I also agree that speed clock per core will slow down. But this can be a good thing, when it's reminds me, as example, Pentium M with nearly half speed could do more than old Pentium 4. Doing more with less clock cycle.

Posted by: Alessandro | Nov 29, 2006 7:04:11 AM

nice reflexion to begin with. what you said seems logical since clock speed will slow down eventually to compromise the power consumption.

but it seems that this isn't only a a java problem but all languages that rely on a gc to do the clean up so c# is on the same boat!! however I think that the jvm will scale up if used in cluster mode.

Posted by: Tarik Guelzim | Nov 29, 2006 10:20:48 AM

Hi Billy,

Nice blog !
Do you have any data to substantiate your claim or it is just your perception. As said by Scott earlier the T2000 server from Sun has outpaced all the 3.x(higher) GHZ processor in terms of performance for most of the mutithreaded JAVA apps. Hey we are heading to a 64 Bit JVM, where GC will be taking lot of cycles and time but there are changes happening to the JVM's as well like induction of ParallerOLDGC flag. As said earlier there must be slowness in slowness in single theraded activity but we don't do lot of that with J2EE based servers ...

Posted by: Dileep Kumar | Nov 30, 2006 5:13:15 PM

Hi Dileep,
The whole whats better story is real interesting. The new quad intel in 2 socket form is expected to beat the pants of Niagara but it illustrates the differences. 'Only' 4 cores but a high clock, many cores and a lower clock or AMD with networked dual core high clock CPUs?

It will be interesting moving forward to see if high multi-core cpus like niagra can run at full speed in typical scenarios.

Posted by: Billy | Nov 30, 2006 5:24:54 PM

I agree with your comments and Niagara is not the end of road and of course Intel and likes can make better/worse processor than what exists today. All I wanted to point out that so far there doesn't seems to be any cases to support your initial claim of "multi-core being bad for java".

Posted by: Dileep Kumar | Nov 30, 2006 6:14:09 PM

It would be great if you could post a part two followup to this blog. It would be my thoughts that a slower clock speed is OK if there are hundreds of cores as the time difference would be made up by avoiding preemption.

Architectures such as Azul where a J2EE application can gain access to 384 cores shouldn't slow down response times as you state...

Posted by: James | Dec 2, 2006 3:54:03 PM

This seems to fall in line with some of your reasoning Billy.

http://www.dailytech.com/article.aspx?newsid=5201

Essentially the 'next' processors are lower clocked woodcrests. Granted these are specificaly low voltage, but since that also equates to heat, sooner or later we're going to see this type of things as a necessity to higher smp chips.

Posted by: Rob | Dec 4, 2006 2:38:02 PM

"Java without the Coffee Breaks": http://www.research.ibm.com/people/d/dfb/papers/Bacon01Java.pdf

Posted by: | Dec 5, 2006 6:45:45 AM

I had practical experience with dual processor IBM Intellistation PC with Pentium 700MHz (Windows 2000) in 2001-2002. It run heavy J2EE application (JDK1.3, EJB/Servlets/JSP) Websphere, Oracle, CVS, many developers connecting to VisualAge repository, some performance monitoring tools, anti-virus, often ERWin, it was just going and going and going. Don't remember it showing CPU utilization of more than 50%, running for weeks without crash or reboot, even with load tests, that was just amazing...
Normal single processor PC would choke on half of it in 2001.
So from that I think that multi-core PCs should greatly benefit Java apps with its normal multi-threading.
Or do you think that multi-core PC is so different from dual processor one ?

Posted by: Oleg Konovalov | Dec 9, 2006 9:06:13 PM

Interesting reply here:
http://dev2dev.bea.com/blog/hstahl/archive/2006/12/multicore_is_go.html

H. Stahl seems to think Multi-Core Java is good for you, and has some results to support it...

Posted by: Jack Rogers | Dec 17, 2006 4:41:06 PM

We've actually been emailing about it and while for GC and systems code, we both agree that highly parallel code can be written the problem remains that clock speeds look set to fall which means Java needs a way to allow applications to multi-thread to the masses and it currently doesn't do that. See my response:

http://devwebsphere.com/devwebsphere/2006/12/new_java_langua.html

Posted by: Billy | Dec 18, 2006 8:32:43 AM

I'm happy to say there's another way to stress test Java's ability to scale on multicore.

There's plenty of frameworks for OLTP and SOA like patterns, but ever try and do a bulk data processing app with Java? Say, 10 million rows of data that's 100 fields wide and your batch window is 20 seconds?

Perhaps some of the folks on this thread have time to help build more benchmarks?? I've got 1 on the site and many more in review about to be published.

http://www.pervasivedatarush.com/beta

Posted by: Emilio | Dec 22, 2006 7:19:45 AM

Hello,

I cannot agree with your statements. Enterprise applications benefit more from multi-core architectures than desktop applications do, because the threading model is very simple there - each request is handled by a different thread. If the threads don't share too much data, such applications scale pretty well. This is rather a case of better/worse application architecture than the Java itself. And java 5 new multithreading classes are great in simplifying things.

You can switch GC into parallel mode. This is not default, but you can and it works fine on our servers (we use J2EE to process lots of SMS messages from all over the Poland).

Lower clock speeds don't make our applications unresponsive. No-one notices if his request is handled in 50 ms or 100 ms. And if it wasn't multicore, we would have to handle more requests on a same core - some requests would simply wait for others to finish. Then it could be noticeable.

--
Our company website: http://www.dinf.pl/

Posted by: Piotr Kołaczkowski | Feb 16, 2007 6:32:08 AM

Piotr,
Thanks for your comment. Here's a thought, imagine right now you are running on a power 5 or a 3.8Ghz Intel CPU and were happy. Now, I tell you that you are deploying on a PIII/800Mhz CPU. That is going to hit your response time pretty significantly. The problem is path lengths are getting way longer due to richer programming models, frameworks, convenience frameworks etc. Programmers are being saved by Moores law. That is coming to an end. When massively multi-core chips hit the market then you'll see a big drop on per core performance. Code path lengths that were acceptable before will not be now.

Posted by: Billy | Feb 16, 2007 9:04:38 AM

I agree with the original post, these multicore servers take time to do initial mark and remark which are stop the world operations on the CMS garbage collector. It's not uncommon to spend a week trying to tune GC on a big enterprise application, only to see you still get 6 second pauses whenever a 4GB old generation gets full or fragmented.

Posted by: Rob | Jul 8, 2007 10:17:42 AM

I agree that Java's path length is long but you gotta know that new multicore CPUs are designed to execute more instructions in one cycle. So even a C2D 6600 2.4GHz is running faster than a P4 HT 4.0GHz.

Posted by: Andy | Oct 17, 2007 1:09:43 PM

Post a comment