September 27, 2006
Is Oracle RAC a product of the past
We (some friends and I, nothing official) were sitting down for coffee the other day thinking would Oracle RAC as a product have been feasible given todays environment. When Oracle RAC was conceived, SMP boxes were expensive. There weren't any low end powerful boxes. Making a database with a shared disk architecture that could scale horizontally on a bunch of low cost, slow boxes made a lot of sense. With the cost of powerful boxes being so high back then, it made perfect sense.
But, today, with Intel just announcing 4 or quad core CPUs, you just know that pretty soon you will be able to buy dual CPU 4 core blades (for 8 total cores) with 64GB of RAM? How many customers really need to horizontally scale these blades to get the performance that they want? Thats a ton of power. Plus, you need a SAN and switches to share the storage. You need a really fast network, ideally something like infiniband due to the very low level communication going on between nodes. This isn't cheap, it's a pretty complicated setup and it's a tightly coupled cluster, if something fails in the cluster, it can impact the rest of the cluster pretty easily with everything being so tightly coupled.
There are simpler approaches available these days. Low cost servers like blades have an incredible amount of power for the money unlike even 5 years ago. A quad core Intel blade from IBM (a HS21) costs about 6k which isn't a lot of money. When the quad core CPUs arrive, I don't think they will cost much more. This, for me, means that a general purpose database that scales horizontally is like a dinosaur, its an architecture from a past generation with a different set of circumstances. Solutions such as IBM UDB HADR seem as good, more reliable and are cheaper.
IBM UDB HADR allows two such blades to use their own local storage and be linked across a fast network. One server replicates to the other. If one fails then clients failover to the second one. It's cheaper (no SAN), more reliable (simpler, less intra nodes comms, higher level comms, less tightly coupled, less sharing), fault tolerant (replicated) and packs enough power for almost all customers (each blade has at least 4 cores! and this is just the beginning, year end 8, next year 16).
So, right or wrong, what we concluded today was that something like Oracle RAC simply wouldn't make sense to even build. The shared nothing approach with replication makes more sense from a commercial and technical point of view. Would making a RAC clone be fun and entertaining as a developer? sure. But, from a commercial point of view? I don't think so.
September 27, 2006 | Permalink
You are talking about DB2 HADR. And 6k blades. Right.
Please put the licenses expenses for the DB2 EE ( 15 K x 2 primary + 15 k standby = 45 K ) or if bought over WE maybe less.
And please, please talk about the replication modes HADR supports, I mean SYNC witch is quite unusable for OLTP or ASYNC for witch you have to pray if using through remote switches that the primary database will not freeze because of a sudden drop of the link bandwith.
Please tell people that the replicated DB is offline really and can't be used in query only mode and practically you have to double the costs and not being able to use the standby resources for other purposes. ( You know, the standby thing replicates the bufferpools as well).
IMHO, HADR is a very expensive solution.
Did you really ever used HADR? I am using it in production and not very happy about the fact that is taken up all the resources of the standby.
Just a tease: http://www.oracle.com/technology/deploy/availability/htdocs/DataGuardHADR.html
I know IBM has a response to this but ...
Posted by: Horia Muntean | Sep 27, 2006 3:31:42 PM
Uh, I forgot, you can't do multi point replication with HADR. Why would you? Just to triple the costs of the setup?
Posted by: Horia Muntean | Sep 27, 2006 3:38:47 PM
If you were using RAC, you'd be using the same number of licenses and you'd be using the same number of machines. RAC isn't cheap either last time I checked. Also try using RAC across multiple locations, non trivial to make work without killing performance due to latencies.
Now that said with RAC you'd get some use out of the second box but depending on the type of work, maybe that could slow the system down if enough disk pinging was taking place between the two boxes.
I'm not attacking RAC in the post, I'm just saying, given todays hardware resources, it wouldn't be developed at all, there is little need.
Posted by: Billy | Sep 27, 2006 10:22:45 PM
I'm not defending RAC either (never used it) or for that matter attacking HADR.
All I am saying is that HADR is not so cheap (did not compared to RAC from the price perspective and I am not even sure RAC and HADR tackle the same high availability needs) and in some scenarios even not applicable.
Posted by: Horia Muntean | Sep 28, 2006 6:40:08 AM
Intel ships quad core server chips in Nov.
Posted by: Billy | Sep 28, 2006 10:36:39 AM
Yeah. You're right. Why the heck does google bother having 30,000 to 100,000 nodes when a dual 4-core intel chip has all the processing power one could ever need.
Posted by: cesium62 | Sep 28, 2006 1:29:58 PM
Thanks for missing the point completely. Lets build a product that only google would buy or even need. You'll make a lot of money on that one. I'm talking about the general database customer. Plus, googles 100k+ machines are not tightly coupled.
FYI, if google can increase their server power/density using quad core chips, you better believe they will be taking them.
Posted by: Billy | Sep 28, 2006 2:05:15 PM
No, you missed the point completely. There exist many customers who like to use more than one node's worth of processing power regardless of how big that node is. If you make that one node really darned cheap, then you make the multicomputer really darn cheap.
The case for RAC is stronger now than it was 20 years ago. People still want to buy a room full of cheap nodes and have those nodes attack a single problem. But now you can fit a lot more nodes in the room.
[And, yes, Google is not tightly coupled. What makes you think RAC is tightly coupled? You know what the first thing a RAC customer does? They partition their application so that it isn't tightly coupled. Tightly coupled problems don't work well on parallel computers. They don't work well on 4-core processors either.]
Posted by: cesium62 | Sep 29, 2006 1:58:27 PM
This article (SMP argument) also misses one of the greatest advantages of RAC, redundancy. If I can buy 64 cheap intel boxes, I can lose 6 (10%) of them and not impact the overall application very much. If I need to scale up, I order a new box or two. They are so cheap that it doesn't matter. I can also add without an outage if the system has been architected correctly. Try that with a single SMP machine. Try removing and adding CPUs while the machine and database are running. Scalability is only one faction of RAC. Reliability is another vital component.
Posted by: Bob | Oct 29, 2006 4:44:18 PM
You're preaching to the choir on the RAIC concept. My point was would a shared everything architecture be a commerical success in todays environment for a database? The reliability argument is a good one but is it more available than a DB2 HADR scenario? I'd argue no because as RAC is more tightly coupled, the odds are higher than a problem on one will trash the other. They are sharing data. The DB2 HADR doesn't suffer as much from this problem as they are relatively loosely coupled and don't share each others data except for the log stream. I don't want to sound like I'm pushing HADR either. That wasn't the point. The point was in todays world, from a commerical point of view, would there be a big enough commerical market for it, i.e. would enough customers need large numbers of quad core machines to make it viable. Yes, there are a few who would need it but would there be enough to pay for the development costs or would a different architecture be used instead?
Posted by: Billy | Oct 30, 2006 1:33:54 AM
Slightly beyond the point but loosely related //
Your initial comments about the requirement for an exensive SAN puzzles me. What makes you think local HDD storage is across the board a cheaper option in terms of TCO ? Unless you operate in an area where you have a clever way of distributing your business and mission critical data over local HDD capacity that is. What about data retention and liability - again it'll likely depend on your core business but so far I have come to terms with the idea that SAN (especially the newer generation)renders better TCO values.
Posted by: rik | Jul 9, 2007 1:55:58 PM