Seminar 3: Exponential back-off with gRPC using python
Goal: To implement exponential back-off with multiple threads in gRPC using python.
Background: A system model encodes expectations about the behavior of (node) processes, communication links and timing (a set of assumptions about failures). In practice, multiple failures can occur at the same time. System models allows us to capture those potential failures into generic models. These can be then used to design distributed algorithms.
- Node behavior, e.g., hard disk failure
- Network behavior, e.g., packet loss
- Timing behavior, e.g., execution/completion order of threads/processes
Several things can go wrong when a client sends a request to a server. A network failure from a crashed server will mean a large number of clients will keep sending requests to access it. In this case, a mechanism can be implemented to ensure reliability in the node level. So clients are configured in such a way that request attempts are scheduled based on an increasing time interval (after each attempt), such that once the server is back online again, it does not crash immediately due to the heavy load of requests from clients.
Failure detection:
- Nodes implements time-outs (connectivity unreachable or destination node is down) - the tricky part is deciding how long to wait for the time-out to trigger.
- If the waiting is too short, the client will wrongly consider the server is dead even if the server is alive.
- If the waiting is too long, the client will be blocked from waiting.
- Ping: A periodic request that a node sends to another to check whether it’s still available. A response is expected within a time frame; otherwise, a time-out is triggered.
- Heartbeat: A message that a node sends to another node to inform that it’s still up and running. The destination node expects a response.
Definitions: Exponential backoff is usually a default way to handle all network failure and unreliability cases. It consists of increasing the average of delay after every successive failure.
- After the initial failure, the sender would randomly re-transmit one or two turns later.
- If there is a second failure, it will try again anywhere from one to four turns later
- If the third failure in a row occurs, then this would mean waiting for somewhere between one and eight turns…. and so on
- The maximum delay lengths form an exponential progression (2, 4, 8, 16, …)
After exponential back-off is implemented, from a set of (concurrent) requests (from multiple clients), it is possible to observe (see Figure below) that the number of requests is scheduled across a time interval (depending on how the server is down) instead of scheduling all of them at once in a short interval.
Prerequisites: Python (Python Download) or Python Installation: https://realpython.com/installing-python/ and gRPC
Exercise: We will get acquainted with the Exponential back-off function in gRPC, by using the a Date time module that supplies classes for manipulating dates and times. The function will return the log of date and times when an attempt to call the server is made with exponential back-off function. The code for implementation is found here.(Code)
Screenshots of server implementation after running
Screenshots of client implementation without exponential back-off and with exponential back-off after running
Task: In Seminar 2, we already implemented RPC with multi-threads by using gPyC in python. For this task, we will implement two clients codes, one without exponential back-off that wont handle any network failure. The other client code should integrate the exponential back-off function to show how to handle failed requests. (Use the same functions in the last tasks or define another function)
Deliverables: Zip file containing the source server file and client file , and also the screenshots of the server terminals and client terminals.
Useful links