Even a chimp can write code

Thursday, June 24, 2004

A look at two-phase commits

The two-phase commit model has been used for over two decades in governing transactions in distributed systems. New enterprise developers often have a difficult time understanding the significance of transaction boundaries and their role in distributed systems. I willingly admit I was in that position once, not too long ago. This entry aims to provide an introduction to some of these concepts to the uninitiated or the newly initiated.

It is common for enterprises to have applications that simultaneously update data in multiple databases. One can easily visualize the number of moving parts in this setup. In order to deliver on their promise of atomicity, consistency, isolation, and durability (or ACID), transactional systems must ensure data integrity and accuracy by providing some sort of synchronous locking mechanism. Transactions therefore provide the glue that ties one or more of these operations together so they occur in an atomic unit or not at all. To understand two-phase commits, one must have a clear understanding of transactions.

The term resource manager is used to signify an external data store or entity accessed by an application. An RDBMS, a JMS provider or an EIS system, are all examples of resource managers. Typically resource adapters expose the resource manager’s APIs to the J2EE server and applications running on that server. As is immediately apparent, the start and the end of a transaction constitute the milestones that determine a transaction’s boundary or demarcation. When an application begins a transaction, it creates a transaction object. It then involves the resource managers in carrying out the transaction while abiding by the ACID principle. The first call to the resource manager identifies the transaction. All subsequent calls happen under the umbrella of that transaction, until the transaction is signaled as ended. The success or failure of the transaction determines whether the updates are to be committed or rolled back.

In a system with multiple databases (or resource managers) participating in a transaction, each participant must commit or rollback based on the outcome, in unison with the others. And this rule holds even if there is a network failure encountered while connecting to one or more of the participants. Databases and other transactional systems rarely ever get a ‘works well with others’ note in their report cards. Hence there is an inherent problem of who will co-ordinate this transaction, ensuring that each participant does its part. You could go ahead write all this logic [as has been done in the past, and I suspect continues to this day] or you could use a transaction manager. Thankfully the coding habits of chimps are well known and widely documented. J2EE servers provide the Java Transaction Service (JTS) to co-ordinate transactions [surprise, surprise!] Most transaction managers use the X/Open XA protocol to communicate with multiple resource managers in a distributed system. Transaction managers are capable of transparently forwarding transaction context from one component to another as also to resource managers.

And this brings us to two-phase commits. There are two distinct stages or processes [or phases, you might add] that constitute a two-phase commit:

  1. the Prepare phase where the transaction manager informs all participating databases about the transaction. At the completion of this stage, the transaction manager expects each resource manager to be ready to either commit or rollback their changes.

  2. Each resource manager in turn returns a success or failure indication. This is the Commit phase, where the transaction manager instructs all participants to commit the transaction. If all resource managers cannot prepare or if there is some kind of failure, the transaction manager asks all resource managers to roll back.

The recovery process after every failure is dealt with automatically and transparently. This wonderful two-phase commit process is however not without its shortcomings. Say an EIS system in this mix is unavailable for an extended period. This would result in all transactions being rolled back unless some means are introduced to force commit to active participants. Usually this is done by human intervention but things can quickly get out of hand. Another problem I have had to deal with recently was an increase in the number of participants in the transaction. Unless planned for in advance, this can throw your calculations off. The team learnt the benefits of asynchronous and loosely coupled systems the hard way [that’s a topic for another day perhaps].

When I started writing this blog, I had imagined it to be my own op-ed page. I have since learnt that opinions by themselves can quickly turn boring. I selfishly hope that entries such that this will help newbie developers understand frequently thrown around acronyms and terminologies better. That’s if the gods at Google are on my side.

PS: A shout out to my buddies from Resource Adapters (RAi). The time I spent working at RAi in the Valley was the most educational contiguous 7 months ever. We did some pioneering J2EE CA stuff and had a penchant for the dramatic!

Email this | Bookmark this