Seminar 2: Remote Procedure Calls (RPC) with multiple threads
Goal: To use Remote Procedure Calls (RPC) together with (multi) threads in python.
- Process: This is a program in execution, with its own memory space, system resources, and execution environment. Each process runs independently of other processes.
- Threads: These are the unit of execution on within a process. When a program is run on your computer, it then starts up a process. That process can then be made up of one or more threads which execute different (computing) tasks.
- All the threads in one process share memory.
- All the threads have access to global variables.
- Each thread has its own stack, program counter and registers.
The importance of threads
- Concurrent execution: Concurrency allows us to schedule multiple tasks on a single processor. These tasks are running simultaneously (with non-deterministic interleaving) and essentially they share CPU time to perform some computation.
- For example: with I/O concurrency, instead of waiting for an I/O operation to complete before continuing execution (thereby rendering the CPU idle), threads allow us to perform other tasks while we wait.
- Parallelism: We can perform multiple tasks in parallel on several cores. Parallelism allows multiple tasks to perform computations at the same time since they are executing on different CPU cores. Typically, parallelism aims to split the complexity of a task into smaller computing problems and it is constrained to an execution time (deadline)
- Convenience: Threads provide a convenient way to execute short-lived tasks in the background. This is possible because they share the same memory space and system resources, making it easier for them to communicate and coordinate with each other. e.g. a master node continuously polling a worker to check if it's alive.
Threading Challenges
Deadlock: This happens when two or more threads are waiting on each other in such a way that neither can progress.
Race: When accessing shared data, What happens if two threads do n = n + 2 at the same time? Or one thread reads a value while another one increments it? An alternative to that is to avoid sharing mutable data.
Coordination: If one thread is producing data while another is consuming that data, "How can the consumer wait for data to be produced, and release the CPU while waiting?" or "How can the producer then wake up the consumer?"
Prerequisites: Python (Python Download) or Python Installation: https://realpython.com/installing-python/ and gRPC
WORKING PIPELINE FOR IMPLEMENTATION
- Create a .proto file to define your service
- Generate the gRPC code from .proto file
- Implement server
- Implement client
Exercise: We will get acquainted with thread in RPC procedure by still using gRPC in Python. we will use the module BinarySearch. The code for the above implementation is given here(Code)
- First, we created a proto file defining the Binary search service
- Second, we generate the code of the proto file in python (Possible for other languages)
- Then Implementing server.py
- The feature 'concurrent.futures.ThreadPoolExecutor' on the server is used to handle incoming requests in multiple threads depending on the numbers of threads we want to create
- Implementing client.py
- We then call the server depending on the request of the client on any of the threads, then the server responds automatically. If the call is made from several clients, we expect the server to respond simultaneously
Running this on the same computer means the code first sends a request for a binary search, then sends this message to the server, and finally prints the result to the client.
Task: Create another service (.proto file) for another algorithm with multithreading gRPC using python. The idea is to have each thread make multiple calls to the server simultaneously
Deliverables: Zip file containing the source code file and the screenshots of the server and the client terminals.
Suggested read or Tutorials
Alternative solutions Besides gRPC, other ready available tools and frameworks are available to perform remote procedure calls.
- RPyC in Python
- zeroRPC
- Several ad-hoc solutions in GitHub