Hajussüsteemid - Kursused - Arvutiteaduse instituut

Seminar 3: Exponential back-off (Client/Server) in python

Goal: To implement exponential back-off with (multi) threads in (RPC) by using python.

Background: A system model encodes expectations about the behavior of (node) processes, communication links and timing (a set of assumptions about failures). In practice, multiple failures can occur at the same time. System models allows us to capture those potential failures into generic models. These can be then used to design distributed algorithms.

Node behavior, e.g., hard disk failure
Network behavior, e.g., packet loss
Timing behavior, e.g., execution/completion order of threads/processes

In the case of communication, a network failure rarely leads to a death-end, instead, re-transmission mechanisms are implemented to ensure reliability in data transmission. Similarly, these sorts of mechanisms are also available at the Node level. For instance, consider a server that has crashed, and a large number of clients keep sending requests to access it. In this case, clients are configured in such a way that requests attempts are scheduled based on an increasing time interval (after each attempt), such that once the server is back online again, it does not crash immediately due to the heavy load of requests from clients.

Failure detection:

Nodes implements time-outs (connectivity unreachable or destination node is down) - the tricky part is deciding how long to wait for the time-out to trigger.
- If the waiting is too short, the client will wrongly consider the server is dead even if the server is alive.
- If the waiting is too long, the client will be blocked from waiting.
Ping: A periodic request that a node sends to another to check whether it’s still available. A response is expected within a time frame; otherwise, a time-out is triggered.
Heartbeat: A message that a node sends to another node to inform that it’s still up and running. The destination node expects a response.

Definitions: Exponential back-off is usually a default way to handle all network failure and unreliability cases. It consists of increasing the average of delay after every successive failure.

After the initial failure, the sender would randomly re-transmit one or two turns later.
If there is a second failure, it will try again anywhere from one to four turns later
If the third failure in a row occurs, then this would mean waiting for somewhere between one and eight turns…. and so on
The maximum delay lengths form an exponential progression (2, 4, 8, 16, …)

After exponential back-off is implemented, from a set of (concurrent) requests (from multiple clients), it is possible to observe (see Figure below) that the number of requests is scheduled across a time interval (depending on how the server is down) instead of scheduling all of them at once in a short interval.

Prerequisites: Python (Python Download), "RPyC" (RPyC Installation).

Python installation: https://realpython.com/installing-python/

Exercise: We will get acquainted with the Exponential back-off function, and we will use function Retry to implement it. Retry calls a function that returns True/False to indicate success or failure. On failure, wait, and try the function again. On repeated failures, wait longer between each successive attempt. If the decorator function runs out of attempts, it gives up and returns False, but you could just as easily raise some exceptions.

The code for the above implementation with different tests is given here for you. (code)

In the Retry module

 # Retry decorator with exponential backoff
 def retry(tries, delay=3, backoff=2):
  '''Retries a function or method until it returns True.

  delay sets the initial delay in seconds, and backoff sets the factor by which
  the delay should lengthen after each failure. backoff must be greater than 1,
  or else it isn't really a backoff. tries must be at least 0, and delay
  greater than 0.'''

  def deco_retry(f):
    def f_retry(*args, **kwargs):
      mtries, mdelay = tries, delay # make mutable

      rv = f(*args, **kwargs) # first attempt
      while mtries > 0:
        if rv is True: # Done on success
          return True

        mtries -= 1      # consume an attempt
        time.sleep(mdelay) # wait...
        mdelay *= backoff  # make future wait longer

        rv = f(*args, **kwargs) # Try again

      return False # Ran out of tries :-(

    return f_retry # true decorator -> decorated function
  return deco_retry  # @retry(arg[, ...]) -> true decorator

WORKING PIPELINE OF THE ABOVE IMPLEMENTATION:

 $ python retry.py

In the retry2.py

 def deco_retry(f):

        @wraps(f)
        def f_retry(*args, **kwargs):
            mtries, mdelay = tries, delay
            while mtries > 1:
                try:
                    return f(*args, **kwargs)
                #except ExceptionToCheck, e:
                except ExceptionToCheck as e :
                    msg = "d seconds..." % (str(e), mdelay)
                    if logger:
                        logger.warning(msg)
                    else:
                        print(msg)
                    time.sleep(mdelay)
                    mtries -= 1
                    mdelay *= backoff
            return f(*args, **kwargs)

        return f_retry  

 @retry(Exception, tries=4)
 def test_random(text):  ## test function
    x = random.random()
    if x < 0.5:
        raise Exception("Fail")
    else:
        print("Success: ", text)
 test_random("it works!")

Task: In Seminar 2, we already implemented RPC with multi-threads by using RPyC in python. For this task, you should integrate the above Exponential back-off function into multi-threads of RPC procedure by still using the module ThreadedServer in RPyC. (Other methods are also good)

You should integrate the source code from retry2.py into the server part. Notice that the remote server executes the exponential back-off function in its class module.
You should also implement another function to tell the date-time of connection if server works (no exception).
You should pass one argument like "it works!" from the client part to the server part for fail tests. When server works, it should send the statement including the argument like "Success: it works!" to the client part and then the client prints the statement.

Server works (your results should be similar to this)

Server breaks (your results should be similar to this)

Possible issues: if you meet some issues of running the code, you can change another port number to try it.

Deliverables: Zip file containing the source server file and client file , and also the screenshots of the server terminals and client terminals.

Link:

NOTE: Watch the video if something is not clear.

Hajussüsteemid 2021/22 kevad

Seminar 3: Exponential back-off (Client/Server) in python