February 12, 2005
WebSphere 6.0 and NFS V4/SAN FS, a match in heaven
When I started figuring where I wanted WebSphere 6.0 high availability to go at the end of 2002, I decided I wanted high availability to be a lot easier and cheaper to setup than was possible at the time. I also wanted to avoid the need for SANs, shared disk arrays etc. All the applications required external software like Veritas and HACMP as well as switched disks attached with fiber or SCSI to do proper failover and it was slow. It could take minutes to restart the failed server.
It was very complex to setup, lots of steps, lots of scope for mistakes. It was too hard. This wasn't unique to WebSphere. Everybody had this problem. Everybody still has this problem today except us. WebSphere 6.0 is kind of revolutionary in this respect. For the first time, customers will be able to get best of class availability (on ANY platform) out of the box on commodity hardware.
WebSphere 6.0 is designed to take advantage of shared file systems like IBM SAN FS, Windows CIFS, and NFS v4. A cluster that uses a shared file system is almost childs play to make highly available now and will recover indoubt transactions for the failing cluster member in around 11 seconds. This is the current limit using a shared file system although faster times are possible using other configurations.
The first thing is to mount a shared file system at the same mount point on all cluster members. Say /mnt/cluster. Next, create a directory for each cluster member. On each cluster member change the transaction log directory to be /mnt/cluster/serverX, where serverX is the directory for that server. Now, bring up the cluster screen and enable "High Availability" services.
That's it, you're highly available. If server A crashes then WebSphere's HAManager component will detect the failure and tell one of the other servers in the same cluster as server A to recover the server A transaction logs. The other servers can see server A's transaction logs because of the shared file system.
WebSphere uses file locking to ensure only a single server at a time can modify a set of transaction logs. NFS v4 and SAN FS provide fast lock recovery when a share file system client crashes. AIX 5.3 can have its NFS v4 tuned to do this after only 10 seconds. SAN FS also. This means that any file locks held by the dead server are released automatically after this interval. This allows the peer cluster member to safely lock the files and do recovery.
If the failed server restarts then the HAManager will do an orderly failback a second or two after the server starts.
NFS v4 is currently only available on AIX 5.3 and Solaris 10. I read the other day that Linus is looking at adding NFS v4 to the 2.6 Linux kernels later this year. This means that this new level of availability will be available on x86 Linux hardware.
Failure Detection in WebSphere 6.0
WebSphere 6.0 detects failures in one of two ways out of the box. One way uses TCP sockets to detect dead servers. This depends on KEEP_ALIVE tuning to be effective when hard failures occur. A hard failure is a power failure, motherboard failure or network problem. The other uses active heart beating. The default heart beat rate is a heart beat every 10 seconds and if twenty heart beats are missed then we mark the server as suspect. Obviously, this is 200 seconds or 3 minutes. This was done so that WebSphere worked well in smaller development machines etc. It is, of course, tunable and we've had it down as low as 6 seconds (2 second HBs with 3 indicating failure). The recovery time for indoubt transaction recovery is the greater of the failure detection time of WebSphere and the lock recovery time of the shared file system. Even if WebSphere can failover in 4 seconds, if the file system takes 10 seconds to release the locks then recovery will happen when the locks are released, i.e. 10 seconds. So, the file system lock lease time should be used as the heart beat rate for production systems. Again, this is the current best recovery time using a shared file system, faster recovery is possible using configurations that don't use a shared file system and thats another blog entry.
WebSphere 6.0 doesn't need IP failover for TM recovery when it's configured for hot standby. This is a big improvement over previous versions of WebSphere and everyone else. WebSphere now uses logical names for the TM and these are resolved dynamically at runtime to where ever the TM was placed by the HAManager.
2 minute Configuration
I hope customers will start to use this new feature. It's a very simple way to make a cluster highly available at very low cost and anybody can do it. It takes under 2 minutes to setup.
- Export a file system
- Mount the file system on all cluster members
- Make directories for each server
- Change the tran log directory to the shared file system
- Check the HA box on the cluster panel.
Some of you may also realise that even without a shared file system, this can help with other scenarios. When you use vertical clustering, i.e. multiple JVMs per server then the local file system is in effect a shared file system for those JVMs on the same server and WebSphere can be configured to do this kind of recovery here also. This won't handle box failure but will handle JVM panics etc.
I tried to get SAMBA to work. It's supposed to do file locking etc and the solution also works on native Windows client and Windows file servers. But, making SAMBA do file locking like this was beyond me. I still think it's possible but I couldn't get any help from the community here. If SAMBA could be configured to work then this technology would work on any Linux with that SAMBA. If someone from the SAMBA community wants to work with me on this then email me. It's a shame that it appears not to work.
The problem with NFS v3
NFS v3 doesn't work here. The problem is locking. When a server fails, the locks only get released when that server restarts. That takes AGES. Minutes. Or an administrator can manually release the locks. Again, not really what I want. The solution really needs lock leasing to work well. Only SAN FS, NFS v4 or CIFS provide this fast lock recovery with no dependancy on the fail server.
SAN FS offers a ultra high performance shared file system that doesn't require large file server boxes. It uses a unique architecture that means it's as fast as a local file system but is fully shared. You can have a couple of small 2 way boxes as SAN FS servers (for HA) and run a lot of WebSphere or database boxes on the shared file system. It also offers a shared file system that uses lock leasing on older operating systems. NFS v4 is available on the latest and greatest operating systems like AIX 5.3, Solaris 10 and later this year 2.6 linux kernels. But, if you're on an older operating system then you don't have this support built in. SAN FS is available on these older operating systems and will enable you to take advantage of this new WebSphere support on those platforms.
Thats it. No extra software besides WebSphere needs to be bought and no switched disks per server when using CIFS or NFS v4 (SAN FS requires a SAN and the SAN FS client to be installed). I think this sets a whole new level of ease of use in clustering on the market and puts WebSphere in front of the pack even after WebLogic 9 ships later this year. The HAManager is exploited all over the place in WebSphere 6.0. The new messaging engine uses it for similar failover times. All the critical singletons in the product that used to run in the deployment manager now 'float' and run anywhere in the cell. Customers can, of course, specify policies to tell WebSphere where they would like these services to run and can specify backup servers etc.
So, there it is. WebSphere 6.0 can achieve < 12 second recovery times on commodity hardware when it's used with a shared file system with lock leasing.
February 12, 2005 in WebSphere | Permalink
Is it possible to create a custom singleton service/EJB in WAS6? The base WAS6 infra that singleton services (like HA Manager) are built on seems suitably generic.
For example, suppose you wanted exactly one instance of a given MDB to be running (reading) at a time in a cluster. (because you want sequential in-order processing from a queue).
Posted by: Andrew Ward | Feb 13, 2005 12:50:59 PM
WebSphere XD has singleton support. We call a singleton a partition. XD lets you programmatically create named singletons at runtime. We already have at least a couple of customers building electronic trading systems with XD. You can do anything you want with these singletons and combined with our threading with async beans, it's a great real time transaction grid platform.
Strict in order message processing isn't available on WAS 6.0. It works in the non exception cases. You can place a single floating messaging engine in a cluster and then only the cluster member where the ME is running will be able to process messages. But, the edge cases like transaction recovery may cause out of order messages to be delivered under those circumstances. There are a bunch of gotchas around in order messaging like in doubts, what to do with the undeliverable messages etc. We have requirements from customers for this and if you email me then more customer pressure is only a good thing.
Posted by: Billy | Feb 13, 2005 12:58:50 PM
A possible low-cost(?) Linux solution could be RedHat's GFS or openGFS(Open Source distr.)?
This solution allows for distributed locking and filesystem sharing on linux boxes, even without the need for a SAN (if you use GNBD).
take a look @ http://gfs.wikidev.net/Main_Page
Posted by: Tom Vleminckx | Feb 14, 2005 11:21:13 AM
Looks like it may work but we haven't tested it so offiically it's not supported but that doesn't mean it doesn't work. If it releases the locks quickly when a node fais then it looks good.
Posted by: Billy | Feb 14, 2005 11:36:53 AM
Is it a good solution to use this mechanism for a single OS instance and 2 Websphere instances, to recover from WS crashes or hangups (this sometimes happen :-( due to application, JVM or even WS (nobody's perfect)
Or is there a more straightforward way?
Posted by: Jacques Talbot | Feb 15, 2005 4:37:35 AM
Hi Billy. Interesting to read your blog and how you've accomplished this. However, the HP Transaction Service, formerly the Arjuna transaction service, has had high availability support since 1992. You can find lots of papers on it here http://arjuna.ncl.ac.uk/
All the best,
Posted by: Mark Little | Feb 15, 2005 4:43:53 AM
Sorry about my previous question, since the answer is already in your post.
And it is YES, if I understand well.
Posted by: Jacques Talbot | Feb 15, 2005 5:12:38 AM
Which WS edition do you need to have this feature?
Express, Standard, ND, XD?
Posted by: Jacques Talbot | Feb 15, 2005 5:17:42 AM
RedHat Enterprise Linux 4 supports NFSv4
IBM GPFS filesystem used to ensure HA for Oracle database can probably be used as well. It supports AIX and Linux.
Posted by: Jean-Yves Girard | Apr 4, 2005 6:22:39 AM
any idea if with WebSEAL and WebSphere Portal the session timeout is set to 8 hours? Is a customer requirement, even if I had bad experiences in the past setting the session timeout to so high value..
Posted by: Robert | Aug 2, 2005 10:02:24 AM