[ACCEPTED]-How do two-phase commits prevent last-second failure?-distributed-transactions

Accepted answer
Score: 49

No, they are not instructed to roll back 8 because in the original poster's scenario, some 7 of the nodes have already committed. What 6 happens is when the crashed node becomes 5 available, the transaction coordinator tells 4 it to commit again.

Because the node responded 3 positively in the "prepare" phase, it is 2 required to be able to "commit", even when 1 it comes back from a crash.

Score: 29

Summarizing everyone's answers:

  1. One cannot 7 use normal databases with distributed transactions. The 6 database must explicitly support a transaction 5 coordinator.

  2. The nodes are not instructed 4 to roll back because some of the nodes have 3 already committed. What happens is that 2 when the crashed node comes back, the transaction 1 coordinator tells it to finish the commit.

Score: 24

No. Point 4 is incorrect. Each node records 13 in stable storage that it was able to commit 12 or rollback the transaction, so that it 11 will be able to do as commanded even across 10 crashes. When the crashed node comes back 9 up, it must realize that it has a transaction 8 in pre-commit state, reinstate any relevant 7 locks or other controls, and then attempt 6 to contact the coordinator site to collect 5 the status of the transaction.

The problems 4 only occur if the crashed node never comes 3 back up (then everything else thinks the 2 transaction was OK, or will be when the 1 crashed node comes back).

Score: 15

Two phase commit isn't foolproof and is 8 just designed to work in the 99% of the 7 time cases.

"The protocol assumes that 6 there is stable storage at each node with 5 a write-ahead log, that no node crashes 4 forever, that the data in the write-ahead 3 log is never lost or corrupted in a crash, and 2 that any two nodes can communicate with 1 each other."


Score: 7

There are many ways to attack the problems 23 with two-phase commit. Almost all of them 22 wind up as some variant of the Paxos three-phase 21 commit algorithm. Mike Burrows, who designed 20 the Chubby lock service at Google which 19 is based on Paxos, said that there are two 18 types of distributed commit algorithms - "Paxos, and 17 incorrect ones" - in a lecture I saw.

One 16 thing the crashed node could do, when it 15 reawakes, is say "I never heard about this 14 transaction, should it have been committed?" to 13 the coordinator, which will tell it what 12 the vote was.

Bear in mind that this is an 11 example of a more general problem: the crashed 10 node could miss many transactions before 9 it recovers. Therefore it's terribly important 8 that upon recovery it should talk either 7 to the coordinator or another replica before 6 making itself available. If the node itself 5 can't tell whether or not it has crashed, then 4 things get more involved but still tractable.

If 3 you use a quorum system for database reads, the 2 inconsistency will be masked (and made known 1 to the database itself).

More Related questions