Seminar 8: Raft consensus algorithm in python
Goal: To study and practice consensus algorithm in python (Raft) for distributed applications.
Background: Consensus can be used in a wide range of applications in distributed systems. For instance, it may happen that some nodes/processes are working maliciously as a faulty individual. Therefore, it is important to reach consensus to ensure reliability and fault tolerance in the system. The consensus problem requires agreement among a number of processes (or agents) for a single data value. Several consensus algorithm have been designed to achieve this distributed decision, e.g., Raft, Paxos. At the fundamental level, any problem that requires consensus can be solved with a state machine replication
- Termination - Every non-faulty process eventually agrees on a value
- Agreement - The final decision of every non-faulty process is the same everywhere
- Integrity - Every correct individual decides at most one value, and the decided value must be proposed by some individual.
state machine replication:
- The main idea behind is that a single process (the leader) broadcasts the operations that change its state to other process, the followers (replicas)
- Total order broadcast: every node delivers the same messages in the same order
- The followers execute the same sequence of operation as the leader, then the state of each follower will match the leader
- Replica is a state machine: starts in fixed initial state, goes through same sequence of state transitions in the same order → all replicas end up in the same state.
Raft
- Raft is based on state machine replication that guarantees the strongest consistency possible
- In Raft, time is divided into terms
- Election and Normal operation under a single leader
- A term is depicted by a logical clock and just increases forward
- The term starts by an election to decide who becomes a leader
- Raft guarantees that for any term there is at most one leader
- Leader election
- Every process starts as follower.
- A follower expects to receive a periodic heartbeat from the leader containing the election term.
- If the follower does not receive any heartbeat within a certain period of time, a timeout fires and the leader is presumed dead
- The follower starts a new election by incrementing the current election term and transitioning to the candidate state
- It then votes for itself and sends a request to all processes in the system to vote for it, stamping the request with the current election term
- The candidate wins the election: the candidate becomes a leader and starts sending out heartbeats to the other processes
- Another process wins the election: In this case, terms between process are compared, if another process claims to be the leader with a term greater or equal the candidate’s term, it accepts the new leader and returns to the follower state
- A period of time goes by with no winner: very unlikely, but if it happens, then candidate will eventually time-out and starts a new election process
- Normal operation
- Client sends command to leader
- Leader appends command to its log
- Log entry = index, term, command
- Leader sends AppendEntries to followers
- Each AppendEntries contains index, term of entry preceding new ones
- Follower must contain matching entry, otherwise it rejects request
- Once new entry committed (stored on majority of servers/nodes/replicas):
- Leader passes command to its state machine, returns results to client
- Leader notifies followers of committed entries in subsequent AppendEntries
- Followers pass committed commands to their state machines
Exercise: We will get acquainted with the Raft algorithm by using the python library PySyncObj. PySyncObj is a python library for building fault-tolerant distributed systems. It provides the ability to replicate your application data between multiple servers. It provides raft protocol for leader election and log replication.
- The code for the above implementation is given here for you. (code)
$ pip install pysyncobj $ python kvstorage.py selfHost:port partner1Host:port partner2Host:port ...
- The normal class that store the key-value pairs
class KVStorage(object): def __init__(self): self.__data = {} def set(self, key, value): self.__data[key] = value def pop(self, key): self.__data.pop(key, None) def get(self, key): return self.__data.get(key, None)
- Transform the normal class into a replicated class
- Inherit it from SyncObj
- Initialize SyncObj with a self address and a list of partner addresses
- Mark all your necessary methods that modifies your class fields with @replicated decorator to achieve replication across different servers.
class KVStorage(SyncObj): def __init__(self, selfAddress, partnerAddrs): cfg = SyncObjConf(dynamicMembershipChange = True) super(KVStorage, self).__init__(selfAddress, partnerAddrs, cfg) self.__data = {} @replicated def set(self, key, value): self.__data[key] = value @replicated def pop(self, key): self.__data.pop(key, None) def get(self, key): return self.__data.get(key, None)
Task: Based on kvstorage.py, please implement the raft algorithm with log index increment when you set new key-value pairs or pop key-value pairs.
Deliverables: please contain the source python file, and also the screenshot of results in terminals. Then put all the file in a single zip.
Link:
- https://people.cs.rutgers.edu/~pxk/417/notes/content/consensus.html
- https://en.wikipedia.org/wiki/Consensus_
- https://www.geeksforgeeks.org/distributed-consensus-in-distributed-systems/
- https://www.youtube.com/watch?v=YbZ3zDzDnrw
- https://github.com/bakwc/PySyncObj
NOTE: Please watch the video or ask us through Slack if something is not clear for you. We usually don't change the points after the deadline.