« 2PC = atomicity? Maybe | Main | Faster CMP with WebSphere 5.0.2 »

September 08, 2003

Presumed Abort Transactions and availability

It's important to understand how transactions work using XA today. There is the part we all know and love, i.e. the XA verbs like start/prepare/commit/rollback. But, what about before we call commit?

When the application server uses an XA connection, the connection gets enlisted in the transaction when the XA start command is issued.

If the application servers gets a commit command from the application when the app server calls prepare and then commit. The application server is responsible for making sure that all resource managers in the transaction either commit or rollback.

But, what about before commit, i.e. between XA start/begin and when the application invokes commit. What happens here.

Well, we're holding locks on database rows, we're making messages on a JMS system invisible because we consumed them. Suppose the application server fails right now.

If the process fails then the OS closes the socket between the app server and the resource manager. The resource manager sees this, knows we didn't reach the prepare stage yet. What does it do? It presumes abort. It rolls back the transaction. This is the 'protocol' called presumed abort. If it doesn't get to prepare/commit then rollback.

Now, there is some stuff to know here. How did the database (lets use a DB in the example) know to rollback the tran? The socket to the client was closed. How did the socket get closed, the client OS did it when the client process (WAS) died as part of the process cleanup.

Seems cool no? Well, what if the power failed. What then? No OS to clean up the socket. Surely TCP has a way to detect sockets with one end dead? Yep, it does. You can tell the OS on the DB side using the KEEP_ALIVE parameter how long to wait on an inactive socket before pinging the TCP stack on the other side. The problem is normally KEEP_ALIVE is like 2 hours.

Thats pretty bad. Your database locks could be held for 2 hours when the app server box fails. This clearly isn't acceptable. So, it's wise to set KEEP_ALIVE to 30/60 seconds or lower. This doesn't really slow down the DB box. It only results in an extra packet being sent when the socket isn't being used for that long, i.e. 30/60 seconds. If the socket is busy then there is no overhead.

This tip applies to any kind of resource manager that uses TCP to attach to clients. A client is an application server here.

So, when setting up highly available systems, don't let a dead app server hold locks any longer than necessary. Set KEEP_ALIVE to an acceptable value and thats the longest connections to the dead client will remain open if the box fails.

September 8, 2003 in Web/Tech | Permalink

Comments

i have great interest on what u discussed in ur articles.i will keep attention on it .
And i wanna know the exactly meaning of "transaction".
wait for ur answer ,thank u

greeting from China

Posted by: Dolly | Sep 15, 2003 1:29:02 AM

A transaction is a WebSphere transaction that allows maybe a Database or JMS provider to be used in a transaction. It guarantees that either all changes are made or none.

Posted by: Billy | Sep 15, 2003 7:41:09 AM

Post a comment