« JDO and CMP 2.0 | Main | Presumed Abort Transactions and availability »

September 07, 2003

2PC = atomicity? Maybe

I recently did a question and answer session with a customer about WebSphere and 2PC. A lot of it was the usual stuff, what is it? whats the performance costs? how do you make it available etc.

The customer was using 2PC to update a DB and send a message. The message triggers a second application which reads the database. Oops, warning bells go off, this will cause problems.

The problem is that 2PC is exactly that, 2 phase commit over multiple resource managers. A resource manager is a JMS provider, a database or a JCA connector. In this scenario, it's MQ series and DB2 for example. So, what happens, the app sends a message, updates the database and commits. App B then receives the message, reads the database and does something. Easy, right? What can go wrong....

Well, this. The first transaction is commited. What does this mean, it means that WAS asks JMS and DB2 "can I commit". They both answer yes and then WAS tells both to commit, JMS first then DB.

So, first JMS is told to commit, immediately the sent message is visible to App B. Then WAS tells the DB to commit, now the records associated with the message are visible. Spotted the problem yet?

Imagine App B gets the message but before WAS has told the DB to commit. The JMS message is visible but the DB changes aren't yet but will be very shortly. Regardless, the MDB receiving the message looks up the record in the DB and gets a record not found because we were unlucky.

So, while 2PC guarantees all changes will occur. There is a window when the changes to the multiple resource managers will be out of sync. This is a problem with ALL application servers. WAS isn't special. This is just the way it is. So, when building applications like the above, expect this. Code for it. If the MDB discovers the data isn't there then rollback. The message will get redelivered and the data will be then the next time probably. You can run in to funny situations like, the DB fails after WAS told JMS to commit. WAS will then continue trying to tell the DB to commit but this fails until the DB is recovered and then it works.

But, given DBs can take from 2/3 minutes to failover to a day depending on how automatic your system is then you need application logic to handle this. Maybe if the record isn't present, use the WAS scheduler or EJB 2.1 timer service to retry in 5 minutes or something. You get the idea. The real world is a difficult place where even what we're taught is bullet proof can do wrong. Understand the technical environment and then your applications will be much more robust.

The problems here are not specific to JMS and DBs. The same thing applies to 2PC with two databases. Changes will be visible in one database before the other. Applications reading data from DB.A which then try and find the corresponding data in DB.B may see old data depending on whether the second DB has been committed by the TM yet or not.

The only way to avoid these issues is to not use 2PC and use a single resource manager and clearly there are times when this just can't be so you need to be aware and design your applications with this in mind to stay out of trouble.

September 7, 2003 in Web/Tech | Permalink

Comments

This is interesting. On the other hand if you use TPC with Oracle the row in question will actually get locked against reading (yes, despite MVC) because the state of the row is still in question for this SCN. So the JMS driven process will query the database and block until the commit message finishes.

Posted by: JJ Furman | Apr 27, 2004 1:30:02 PM

Post a comment