« JDK 1.5 a lot faster | Main | EJB 3.0 PM not so portable in current form »

September 06, 2005

Opinion on APIless caching.

The more I think about this the more I don't think it's going to work for the mass market. I say this because most people have difficulty with multi-threading programming. It's hard, it's always been hard. We offer multi-threading to WebSphere customers with the WorkManager APIs and so long as it stays simple then customers are happy but it's easy to get tied up in dead locks etc also, scaling and bottleneck issues. It's hard to write multi-threading code that scales vertically. Microsoft are just getting Windows to do this now. It is absolutely non trivial. Operating system vendors and database vendors have large investments in this kind of code. Why do we think customers can now write perfect multi-threading code that can be transparently distributed using a non invasive approach?

The non API caching stuff works by intercepting existing code and then proprogating changes or acquiring distributed locks on sync blocks. This works, of course, but as a mass market API for caching I don't like it precisely because of the skill level required by customers to do it. I'd prefer a higher level API with an appropriate abstraction to simplify it. I'd prefer an API which is simple to use for vanilla tasks but also allows advanced users to get benefit from it also. The ObjectGrid API and its competitors offer this kind of blend and we're currently looking at approaches for radically simplifying the API further.

The other thing is that the idea that you can write a multi-threading piece of code for a single JVM and that magically make it scale up across a distributed environment is stretching belief for me. Distributed algorithms are different from single process algorithms. You usually need to build something designed to scale horizontally and many times it's significantly different than something designed for the in process case only.

I believe that these technologies can make a single JVM algorithm work in a distributed environment, I just don't believe it'll work well or scale optimally, just as most multi-threading programs will not scale vertically without a lot of work, the same applies to scaling horizontally except more so given the extra network latencies etc that will likely require a different algorithm optimized around those costs.

So, I think API based caching or distributed caching has a role and will continue to have a role moving forward. Efforts such as Java spaces, JCache, our own APIs are trying to simplify the task of the application programmer who requires these services. Customers don't want to write middleware, they want to write applications. Could I come up with a test that demonstrates linear horizontal scalbility with these non API products, absolutely, but I can manufacture a benchmark to prove pretty much anything if asked to.

September 6, 2005 | Permalink

Comments

I think you are 100% correct.

Posted by: | Sep 6, 2005 12:15:41 PM

I think I'm familiar with caching and with threading concepts. But I don't see what "APIless caching" is and the relation between caching and threading.

Can you give an example of APIless caching you're referring to ?

Posted by: | Sep 7, 2005 6:21:07 AM

Terracotta would be one example

Posted by: Billy | Sep 7, 2005 4:13:04 PM

Billy, what do you think about their arguments?

http://blog.terracottatech.com/archive/2005/09/object_identity_1.html#comments

Posted by: | Sep 16, 2005 1:56:25 PM

See http://devwebsphere.com/devwebsphere/2005/09/final_word_on_c.html

Posted by: Billy | Sep 16, 2005 4:06:05 PM

The knock on threading has always been that some problems are linear or near linear and consequently will not parallelize to take advantage of threading. Amdels law states this in very precise terms. However to stretch this to say that API less distributing computing will not scale is (IMHO) a stretch in the other direction. There are certainly many cases where problems will scale out with threading. The J2EE is a fine example of this.

Right now I'm completely bewildered by developers acceptance of the boundries set by the current VM technology. What Terracotta is proving is that there is value in solving the distributed computing problem below the application and out of sight of the developer. Instread of everyone creating a new framework that doesn't quite work because it doesn't have access to the information that it needs to make it work, why not push this solutions to the distribution problem back into the JVM where it belongs.

Posted by: Kirk | Jan 28, 2006 4:57:23 AM

Post a comment