Seminar 4: Berkeley algorithm in python
Goal: To study the importance of clock synchronization between threads by using the Berkeley algorithm.
Background: Why is clock synchronization between computer nodes needed? The need for synchronization originates when node/processes need to execute concurrent events following a partial order that is bounded to time (lower and upper time bounds). Time allows processes/nodes to measure/track the lifespan of an specific action. For instance, consider two threads competing for locking the access to a resource, e.g., database. Assuming that there is not deadlock, once a thread has locked the resource (completely), it can use the resource indefinitely (if there is not time-outs). Thus, time can allow the system to configure limited access time (time-out) to a resource, such that the system does not starve. Time is also of great importance in many other areas of distributed systems. Multiple nodes/processes need to:
- Cooperate and schedule operations
- Agree on the ordering of events
- Determine validity of data
- Record the lifespan of an event
- Analyze the system performance
Computer clock synchronization in distributed systems is a difficult task as individual clocks in different computers need to be synchronized to a global time. Typically, GPS and NTP are available time source mechanisms to synchronize a distributed system (when connectivity to Internet is available). Other mechanisms are also available when there is not Internet connectivity (We will explore these later ones in this seminar).
Berkeley algorithm: Berkeley’s Algorithm is a clock synchronization technique used in distributed systems. The algorithm assumes that each machine node in the network either doesn’t have an accurate time source or doesn’t possess an UTC server. It is mainly for systems with no access to external accurate timing signal.
- A leader is (typically) chosen via an election process. (Election will be covered in Lecture 6, we will assume now that the leader is pre-assigned and fixed)
- The leader polls the followers who reply with their time in a similar way.
- The leader observes the round-trip time (RTT) of the messages and estimates the time of each follower and its own.
- The leader then averages the clock times, ignoring any values it receives far outside the values of the others.
- Instead of sending the updated current time back to the other process, the leader then sends out the amount (positive or negative) that each follower must adjust its clock. This avoids further uncertainty due to RTT at the follower processes.
- Gusella and Zatti released results involving 15 computers whose clocks were synchronised to within about 20-25 milliseconds using their protocol [1] .
Prerequisites: Python (Python Download), "RPyC" (RPyC Installation).
- Python installation: https://realpython.com/installing-python/
Exercise: We will get acquainted with the Berkeley Algorithm by using multithreads in python, and understand how to play with threads and interconnect threads. In the python file, we already create n threads (1,2,3,4,5), each thread is set with a time then all the threads synchronize its clock based on a leader (e.g., 3 is the leader). The leader (or the coordinator) is selected based on "C" from the input.txt among a list of processes/threads.
- The code for the above implementation is given here for you. (code)
- clock-sync.py
- input.txt is the path to the file where threads are defined.
- 1, A_K, 11:00, S/C --> thread_ID, thread_NAME, thread_TIME, S(Slave), C(Coordinator)
WORKING PIPELINE OF THE ABOVE IMPLEMENTATION:
$ python clock-sync.py input.txt $ list # shows the list of threads running $ clock # shows the clock of threads
In the clock-sync.py module:
# for each Processes object, containing id, name, time and label from the input.txt $ def __init__(self, id, name, time, label) # update the coordinator time $ def tick(running, processes)
Task-1: You should implement "clock-sync.py" to achieve the Berkeley algorithm.
- Still show the list of processes running while also show the process acting as coordinator.
- Define a clock-update function to show the current clock time of each process. (Clocks are synchronized based on coordinator clock; we won’t take the time average of each process as stated by Berkeley)
- Define a kill function to remove a process except coordinator.
- Results should be similar to this.
Task-2: In Seminar 3, we already learn to implement Exponential back-off function with multi-threads by using RPyC ThreadedServer in python. For this task:
- integrate the function of Task-1 into RPyC server side.
- pass arguments ("input.txt" and different commands) from client to server.
- The form of result should be similar to this. (Note that the screenshot below is just a sample, the result must be based on Task-1)
Deliverables: For each task, please contain the source python file, and also the screenshot of results in terminals. Then put all the file in a single zip.
Link:
NOTE: Please watch the video or ask us through Slack if something is not clear. Please submit complete homework files, otherwise we deduct more marks.