September 16, 2005
Final word on caching with APIs and no APIs
This is really a response to a comment on the Terracota tech blog. First, I'd thank him for for the compliment at the beginning :) And I don't want to get in to a never ending thread about this so this is my last blog on this topic. There are two philosophies here and mine is just different than theirs and I really thats as close as we will get. Their technology is cool but most technology in this space is cool, ours, Tangosol, Gigaspaces etc. It's just how consumable the technology is for average customers that counts and this is where there is clearly two different approaches to the problem here.
The example of file caching is a not a great one as all file systems offer the capability to lock files in memory etc because the vanilla algorithms break down under high performance conditions. Look at databases, don't they just magically cache the data and make it fast with no user interaction, anybody who has worked with a database will know that if you just kick the tires on it, it looks good but the reality is that they are full of knobs and special SQL clauses, flags to allow the applications to work at peak efficiency and it takes an expert to do that kind of tuning and thats with a VERY high level API for the developer!
Another way to look at this is why have file systems them selves even? Just give the developers, BLOCK I/O primitives and there you go, right? Simple matter of coding... Whats all the fuss about? APIs are bad, right?
To compare a file cache with the complexity of a distributed set of servers working together is an apple and pears comparison. When these performance problems are discovered it's likely with their approach than an application rewrite will be needed because their abstraction is too low to really do something about it in the engine. A lot can be done to speed a database without changing the application because the API is quite high level, i.e. SQL. This is not true of synchronized blocks and threading. I think higher level abstractions and hence APIs have an advantage here and there are many real world examples of this approach being successful. As an example, a database admin can simply add an index to improve performance. How do you do that when you don't even know what data structures the developer invented to hold his data? You could tell them use a Map and we'll swap the implementation underneath but now it's kind of invasive, isn't it? plus it starts to smell of an API, no? It's also not to obvious to someone reading the code, whats going on. With the database, the select statement didn't change, is that the case here? The underlying engines options are more limited in terms of what's possible in terms of tuning because of the low level abstractions.
The main thrust of my blog was that most people will find it difficult to build anything but very simple systems with that approach purely because of their experience and skill limitations. They may be insurance experts or Swing experts but why do they need to be expert in distributed algorithms now also? I didn't say it was impossible, I just said it would be beyond the capabilities of most customers and basically the author concedes that a lot of care would be needed to have the right sync blocks for batching etc or it not going to work. This doesn't sound very transparent to me, it's more an abuse of the existing APIs in the language in an 'unnatural' way to express something better expressed in a higher level API and the higher level API would certainly be easier to understand by a developer as well as to actually develop it than unweaving a bunch of critical sections. To say that an API approach which provides a higher level abstraction pushes the heavy lifting on the customer is almost an oxymoron. Why did Doug Lea provide a higher level abstraction for concurrent programming? We had nice primitives already, right? Because these abstractions can make the customers job EASIER, lowers the skill level required and moves the heavy lifting in to middleware which is why you buy it.
My claim about the difficulty of writing vertically scalable code also stands. This IS hard, that is utterly indisputable. That approach assumes that code written like this will scale to 10 or 20 machines as well as scale vertically. I'm sorry, I just don't believe that normal people can do this for anything but the most simple of problems. Could a skilled distributed person do it? Probably, but thats not the normal situation.
As an example, look at something as simple as designing a Map object that allows highly concurrent access. Easy, right? Nope. This isn't just a single sync block around all calls to the Map. It takes quite a bit of ingenuity to code one that works well probably why Doug provided one in the new concurrent APIs. I'd prefer to provide a good Map object to a customer than just claim, you have everything you need in Java and critical sections, off you go... That doesn't strike me as high value to the majority of the marketplace.
It's like saying to a customer who needs a fishing rod "Here's some wood, some plastic pellets, an oven, an extruder and a lathe, go build your fishing rod!". Now you build a bad fishing rod. The vendor responds by tweaking the oven but doesn't that give you a better rod? No, I'd say most of the success is in the hands of the guy making the rod, and I don't what that to be the customer! I'd prefer them to just buy the fishing rod thank you, it's a whole lot cheaper and it's also what they wanted to buy in the first place. Abstractions WORK and are very useful to most customers. Now, the experts will take the lathe etc (i.e. a toolkit) but again, thats why they are experts. It's a question of the appropriateness of the abstraction to the customer set and we want to make it rel event to the majority of the thousands of customers we have.
Yes, whether its APIs or no APIs, it will still takes tuning like my database example but I feel a higher level abstraction can provide better out of the box performance than a lower level abstraction as it's going to be less influenced by the way the application was coded. At least it's within the reach of most people and their applications will be using a high enough abstraction API so that pretty big performance improvements should be possible before an application redesign would be required. I'm not saying that application redesign is never required with a API based product, it can still be abused. I'm just saying, it's less likely, thats all. I still maintain that for anything other than the simplest of applications (a whiteboard for example) or session replication (when they provided the expertise), it will work that well or really scale. I could see them releasing utility jars that provide precanned patterns that applications can use (like session replication or Dougs excellent work) but then, thats an API, isn't it?
To finish, yep, very skilled people will be able to build distributed systems with their approach. But arguably, they could do it anyway, thats why they are skilled. But what about everyone else? They just want to get something working and buy middleware that just helps them out of the box. Do you want a fishing rod? or a recipe, parts and manufacturing equipment along with a skill gap? I'll take the rod.
September 16, 2005 | Permalink
TrackBack URL for this entry:
Listed below are links to weblogs that reference Final word on caching with APIs and no APIs:
Tracked on Apr 27, 2006 3:14:57 AM
"As an example, look at something as simple as designing a Map object that allows highly concurrent access."
or you can go back to old school lock-everywhere Hashtables, and get optimistic concurrency out of hardware.
Are you also proposing, Billy, the we introduce the dispose() API to Java? or work on better garbage collectors?
Posted by: bob pasker | Sep 30, 2005 6:09:08 PM
To tell the truth, GC is getting better all the time and the GC guys say pooling, i.e. basically an explicit dispose, is now starting to get in the way but no matter, for customers who can't wait for the next great GC/JVM, dispose like mechanisms isn't a bad idea. But, when the newer JDKs come along the code may not perform at that point on those JDKs but today, on the JDKs they run on, it's a more efficient solution. So, they just need to make sure they are flexible in that when the new JDK/GC arrives then they can easily mod the app to not use a dispose like mechanism.
Posted by: Billy | Sep 30, 2005 8:28:55 PM
Stu Charltons comments --> http://www.stucharlton.com/blog/archives/000085.html
Posted by: Billy | Oct 13, 2005 10:13:53 AM