« Future is application and coarse components? | Main | Dealing with long running synchronous operations »

January 17, 2006

Cheaper HA with Linux DRDB and WebSphere

An IBMer called me this morning who works on http://www.linux-ha.org They have some cool technology that lets you build clusters out of commodity parts.

WebSphere 6.0 ND was enabled to try to lower the cost and improve the availability of deploying WebSphere on commodity hardware. For example, you can keep the tran logs on an NFS v4 file system and then we make WebSphere automatically fault tolerant to indoubt transaction recovery.

The linux ha project gives some new choices. They have a technology called drdb which is basically synchronous disk replication over ethernet. Lets take a simple scenario with 2 boxes. Each would need 3 logical disks according to my understanding. You could have a single physical disk with 3 partitions on it.

Put linux on partition 1, and then put WAS + its logs on partition 2. Set up partition 3 as a sync replica of partition 2 on the second box. Now repeat the same on the second box. Partition 1 is box 2's linux, partition 3 is their WAS + logs installation and partition 2 is the sync replica of partition 2 on box 1.

So, basically the disks in each box are now mirroring the partition on the other box. DRDB does block level replication of these partitions. DRDB can run a script when a partition becomes the primary partition. Such a script could cold start WAS on the box.

A slightly more complex but 'better' solution would be to use 4 partitions per box. Box 1 has the following partitions:

  1. Operating system (not replicated)
  2. WebSphere installation for node 1 (not replicated)
  3. Transaction logs partition for node 1 (replicated)
  4. Replica of partition 4 on box 2

Box 2 has

  1. Operating system (not replicated)
  2. WebSphere installation for node 2 (not replicated)
  3. Replica of partition 3 on node 1 (replicated)
  4. Transaction logs for node 2 (replicated)

If box 1 and 2 are part of a WebSphere cluster then we can do also do hot recovery. When box 1 fails then DRDB promotes partition 3 on box 2 to primary and runs a script. The script pokes the JVM on node 2 to recover box 1's transaction logs. This removes the JVM start up time from the recovery time.


Customers are getting more and more choices as far as storage solutions and these choices are increasingly lowering the cost of clustered solutions. WebSphere 6.0 can allow customers to leverage these solutions and provide recovery times that are very competitive with much more expensive solutions.

January 17, 2006 | Permalink Sphere


Post a comment