« When is a rollback a commit... | Main | Verifying a backend systems status in a cluster using WebSphere XD »

May 18, 2005

Single box high availability is possible

Most people that see the usual "This is how you make a system highly available" see a bunch of boxes arranged in pairs and then everybody relaxes. But, this is a very complex setup and, as most math majors know, 2 of anything is twice as likely to fail, so whats 14 of something?

Disk virtualization

We've had this for a while now. My disk system in the lab is a Network Appliance F940 with 28 disks on it. I use iSCSI over Gb ethernet to hook it up to my Linux and Windows boxes that need high speed disks. I basically don't worry about how many disks I need anymore. I just made a large 500GB RAID volume on the NetApp and then parcel it up in to logical disks (LUNs) when I need them. For example, if a database box needs 20GB of disk for data and 5GB for logs then I create two more LUNs from the 500Gb and then assign them using iSCSI to the db box. The database doesn't care how many physical disks etc the 20GB/5GB disks use, the NetApp virtualizes that and it basically looks like a kick ass disk subsystem. We can now start to do the same stuff with operating system images as we'll see. Imagine, I had a box with a collection of processors, memory, network cards, and disk controllers. Why couldn't I do the same thing with this? You now can.

Virtualization to the rescue

High end SMP (and soon Intel boxes) have virtualized hardware now. The machine comes with whats called a HyperVisor whose job is to manage the hardware on behalf of the operating systems running on the box. Many people are familar with VMWare. This virtualizes a PC and allows you to then install Linux or Windows in to that virtual PC. You can then have 4 virtual PCs each running Linux running on a single real PC. Hypervisors take this to a new level.

IBM Power 5 and virtualization

Lets look at an IBM power 5 box. I have one in my lab. It's got 4 CECs. A CEC is a building block. Each CEC has 4 CPUs (mine are 1.9Ghz), 16GB of memory, disk controllers, network cards etc. The CECs are connected together to form a single 16 way SMP with 64GB of RAM. More CECs build bigger boxes. There is another box (a small PC) on the side called a HMC. This controls the hardware. The HMC lets you create a virtual PC or in IBM speak, an LPAR (Logical Partition). You can then assign it memory, CPUs, network cards, disk controllers. Then you can boot that LPAR and install AIX in it or Linux. The resources assigned can only be used exclusively by a single LPAR at a time. But, and here is the cool bit, you can change the number of CPUs assigned to an LPAR when ever you want.

HA with LPARs

So, lets say we wanted a highly available database, IBM DB2 UDB for example. Normally, you'd have SAN based disk and two boxes with HA software like HACMP installed. This is wasteful because the second box does nothing until the first box fails. We can do better with the LPARed system.

Lets use a single 4 way power box. Define two LPARs. Give each LPAR half the memory. Call one LPAR P and the other S. Give P 3 CPU, give S one CPU. Setup HACMP just like there were two real boxes.

Once you're done, you've got a HA setup that probably better than two boxes. The single p5 box has redundant power, redundant LAN, and redundant fiber cards. The memory has fault tolerant features. CPU failure can be detected and then masked as bad. One processor is being used to run the backup system/LPAR so we're still 'wasting' 1/4 of the box for the back. But, it's possible to assign fractions of a processor to an LPAR. For example, we could assign 3 3/4 CPUs to the primary and a 1/4 processor to the backup. Now, we're only wasting 1/16 of the box. The minimum LPAR size depends on how much CPU the monitoring takes. The fractional CPU deal is called micro partitioning. But, you need to be careful with micro partitioning, it can be a performance killer because it can increase the amount of CPU cache thrashing which can hurt performance on that processor.

Now scale it up

This pattern can be expanded to more complex configurations. Lets say we have a 16 way box. I'd make 3 AIX 5.3 LPARs (A, B and C) of 4 processors to run WAS 6 ND. I'd make another 2 AIX 5.3 LPARs (X, Y) of 3 and 1 to run DB2 and NFS v4. I'd use HACMP to make X and Y fault tolerant and use HACMP scripts to tell the HMC that when X fails over to Y, dynamically change the CPUs for X to 1 and Y to 3. So, the backup LPARs use minimum resources until they become the primary and then we give them the primaries resources (at least in terms of still functioning CPUs).

This kind of flexibility with LPARs and being able to dynamically change the number of processors assigned to an LPAR time slice by time slice makes these larger boxes much more economical than before. The price of 4 ways versus 2 ways has collapsed to be basically linear, 8 and 4 ways is almost there. 16s will likely follow.


So, LPARs bring new options to the high availability table. These options existed in the mainframe world before and in limited forms on the older generations of Sun, HP and IBM machines. The latest virtualization technologies are much more flexible and will increasingly advance on the 390 full blown virtualize everything model. Probably worth doing a google on now to learn more about it.

May 18, 2005 in High Availability | Permalink


Seems powerful and exciting to set up, but it sounds expensive. The nice thing about clustering is that you can use cheaper hardware. If anything fails on a box, just remove it, fix it, and put it back into the cluster.

Also, I would think that you'd want to replace this box whenever CPU technology improved. That would be expensive and downtime would be required. With a cluster of less expensive boxes you can replace the boxes piecemeal over time with no downtime.

I think your solution is great for a large company with deep pockets. My cash conscious boss is a bit more frugal and looks for the highest possible ROI.

Posted by: | May 19, 2005 2:16:07 PM

There are box costs advantages on the commodity box side but its possibly less reliable than a single box. I've personally heard customers tell me that a bunch of cheap small boxes isn't what they want, and I'm heard customers say the opposite. The former customers are more comfortable with more reliable single boxes using LPARs for availability. The latter don't mind the increased failure rate of more, smaller, cheaper boxes.

I can see both points of view. I'm just highlighting the big SMP point of view in this post, I'll highlight the other in a future post.

Posted by: Billy | May 19, 2005 2:24:55 PM

I see where you're coming from, but:
a) In your example config, you have several single point of failure's: the NetApp box (so you need another one) and the server (likewise). I'm not saying the server is unreliable, but a single (human) error on the host system can bring down all virtual envs with it
b) Quote: "2 of anything is twice as likely to fail, so whats 14 of something?"
14 times as likely, but with spread-out clustering, like Web, Oracle RAC, NFS and such, each failure (in 13+1 setup) would amount to a small temp loss (1/13th) of capacity and small impact on (1/13th of) the users.

Every approach has its advantages and disadvantages, the right choice depends on circumstances.

Posted by: angryman | Oct 11, 2005 11:32:58 AM

Post a comment