Seminar 12: Consensus in faulty systems with arbitrary failures
Goal:
- Reaching consensus in a fault-tolerant process group in which k members can fail assuming arbitrary failures.
- We need at least 3k + 1 members to reach consensus under these failure assumptions.
- Byzantine agreement
- Requirements for Byzantine Agreement.
Description
- In this seminar, we take a look at how to reach a consensus in a fault-tolerant process group in which k members can fail assuming arbitrary failures.
- We will also show that we need at least 3k + 1 members to reach consensus under these failure assumptions.
Definitions: Consider a process group consisting of n members, of which one has been designated to be the primary, P, and the remaining n - 1 to be the backups B1, . . . ,Bn -1.
- Assumptions:
- A client sends a value v {$ \epsilon $} {T, F} to the primary, where v stands for either True or False.
- Messages may be lost, but this can be detected.
- Messages cannot be corrupted without that being detected (and thus subsequently ignored).
- A receiver of a message can reliably detect its sender.
- Requirements for Byzantine Agreement:
- BA1: Every nonfaulty backup process stores the same value.
- BA2: If the primary is non faulty then every nonfaulty backup process stores exactly what the primary had sent.
- Note: If the primary is faulty, BA1 tells us that the backups may store the same, but different (and thus wrong) value than the one initially sent by the client. Furthermore, it should be clear that if the primary is not faulty, satisfying BA2 implies that BA1 is also satisfied.
- Why having 3k processes is not enough: Consider the following example where k=1:
- In above Figure:
- In first figure, we see that the faulty primary P is sending two different values to the backups B1 and B2, respectively. In order to reach consensus, both backup processes forward the received value to the other, leading to a second round of message exchanges. At that point, B1 and B2 each have received the set of values {T, F}, from which it is impossible to draw a conclusion.
- Likewise, we cannot reach consensus when wrong values are forwarded. In second Figure, the primary P and backup B2 operate correctly, but B1 is not. Instead of forwarding the value T to process B2, it sends the incorrect value F. The result is that B2 will now have seen the set of values {T, F} from which it cannot draw any conclusions. In other words, P and B2 cannot reach consensus. More specifically, B2 is not able to decide what to store, so that we cannot satisfy requirement BA2.
- In above Figure:
- Why having 3k + 1 processes is enough: Consider the figure where k=1:
- In above Figure:
- In first Figure, the primary P is faulty and is providing inconsistent information to its backups:
- Solution: The processes will forward what they receive to the others. During the first round, P sends T to B1, F to B2, and T to B3, respectively. Each of the backups then sends what they have to the others. With only the primary failing, this means that after two rounds, each of the backups will have received the set of values {T,T, F}, meaning that they can reach consensus on the value T.
- In Second part of the figure, we consider the case that one of the backups fails.
- Solution: Assume that the (nonfaulty) primary sends T to all the backups, yet B2 is faulty. Where B1 and B3 will send out T to the other backups in a second round, the worst that B2 may do is send out F, as shown in the figure. Despite this failure, B1 and B3 will come to the same conclusion, namely that P had sent out T, thereby meeting our requirement BA2 as stated before.
- In first Figure, the primary P is faulty and is providing inconsistent information to its backups:
- In above Figure:
EXERCISE Solve for consensus in 3k + 1 Processes under following case scenarios by meeting the above mentioned assumptions and Byzantine Requirements:
- K = 1 , B3 faulty, Primary, P and other Backups non faulty, values, v {$ \epsilon $} {T, F} and Primary, P sends T to all backups
- K = 2 , B4 faulty, Primary, P and other Backups non faulty, values, v {$ \epsilon $} {0, 1} and Primary, P sends 1 to all backups
- K = 2 , B5 faulty, Primary, P and other Backups non faulty, values, v {$ \epsilon $} {ATTACK, RETREAT} and Primary, P sends RETREAT to all backups
- K = 2 , B6 faulty, Primary, P and other Backups non faulty, values, v {$ \epsilon $} {PEACE, WAR} and Primary, P sends PEACE to all backups
Deliverables of practical session: A PDF file containing the answers to above exercise
Referrences:
- Distributed Systems, Third edition, Version 3.02 (2018) by Maarten van Steen and Andrew S. Tanenbaum