Seminar 6: Leader Election
Goal: Delve into the dynamics of leader election, a pivotal concept for decision-making and coordination within distributed systems.
Introduction
In Session 6 of our Distributed Systems course, we delve into the foundational concepts of leader election and mutual exclusion. Leader election is a fundamental mechanism for ensuring coordination and organization within distributed systems, allowing nodes to elect a single leader responsible for guiding collective actions. Mutual exclusion mechanisms enable us to regulate access to shared resources, ensuring that only one process can access a critical section at any given time. By leveraging leader election techniques to achieve mutual exclusion, we aim to foster a deeper understanding of how these concepts interplay, contributing to the robustness of distributed systems' architecture, and their significance in maintaining system integrity and consistency.
Now that you have established the infrastructure to verify if an order is valid, the rest of the system can proceed to deal with the order (e.g. by updating the stocks of available books, committing the order to the database, executing the payment, triggering the notification systems, etc…). The execution of a given order is considered a critical section or resource, which should only be dealt with, at most, once. Therefore, having a group of executors, these should reach an agreement regarding which executor has access to a given order, at a given point in time, proceeding with its execution. You should apply the knowledge you gathered about Leader Election and Mutual Exclusion mechanisms, in order to solve this problem.
Task
Extend your system with the following functionality and new services:
- Order Queue: Create a new gRPC service - the order queue - which will perform basic queuing functionalities: enqueue and dequeue. This service should queue the orders, to be later consumed by the executors.
- Enqueue: In the orchestrator, after the order verification is done (fraud detection, transaction verification, and book suggestions were executed), if valid, insert the order in the order queue, wait for the confirmation that the order is enqueued (meaning, it was inserted in the queue), and return, as usual, the order approval/rejection to the user frontend.
- Bonus points: Instead of a simple queue, implement a priority queue, to enqueue orders based on some priority property or mechanism (e.g. higher value, number of books, premium user, location, shipping method, a heuristic scheduling mechanism based on AI etc…). Feel free to decide the requirements - the mechanism does not need to be based on real requirements, the goal is to implement priority functionality. Bonus points will be awarded according to the complexity of the functionality.
- Order Executor: With the order queue in place, the order executor service should be implemented. Create a new gRPC service, according to the following:
- Replicas: The order executor service should be replicated N >= 2 times. This means that there should be at least 2 instances of the order executor service running at every moment. They should implement the same internal functionality (the same code should run in all instances). Search for a way to replicate services within your docker-compose setup.
- Dequeue: The order executor service (replicated N times) should have functionality to dequeue an order from the order queue and proceed with the execution (for now, after getting an order from the queue, just log “Order is being executed…”). Since the service is replicated, concurrent access to the queue and, therefore, to every order, should be controlled, achieving mutual exclusion. In order to control it, we will leverage Leader Election mechanisms to establish which replica has access to a given order, at a given moment.
- Leader Election: Your task is to devise a leader election mechanism to establish which replica is responsible for each order execution. The chosen replica should be the one to dequeue the next order in the queue and execute it. You should review the different algorithms you learned in the lectures and, based on their advantages and disadvantages, choose one to implement this functionality. Your architectural choice will be later discussed in the next Checkpoint.
- Bonus points: If you can demonstrate that your mechanism dynamically works for 3, 4, … (N > 2) replicas, while being resilient to failures. Be aware of the trade-offs and points of failure of the different algorithms: centralized, decentralized, etc…