Seminar 12: Implementation of Distributed Hash Table (DHT) in python
Goal: To study and practice the implementation of the partitioning method DHT in python for scaling a distributed system.
Definition: Scalability describes an application which can increase its capacity as its load increases. The added resources (capacity) fall into two broad categories: horizontal (scale out) and vertical (scale up) scaling.
- Scaling out – (replicating between multiple nodes)
- Adding a new computer to a distributed software application
- Scaling up – (upgrading to more powerful hardware)
- The addition of CPUs, memory or storage to a single computer
horizontal scaling (scale out) dimensions:
- Functional decomposition: The process of taking an application and breaking int down into individual parts.
- Partitioning: When a dataset no longer fits on a single node, it needs to be partitioned across multiple nodes.
- Key-value stores methods – Decentralized architectures
- Range partitioning
- Hash partitioning: Distributed Hash Table (DHT)
- Key-value stores methods – Decentralized architectures
- Duplication: Adding more capacity through replicated instances equipped with load balancers (request routing)
Hash Partitioning:
- Hash function is utilized to assign keys to partitions
- Hash functions maps a potentially non-uniformly distributed key scape to a uniformly distributed hash space.
- Distributed Hash Table (DHT)
- Data is distributed across multiple nodes, where each node stores additional information to form a ring topology
- Distribute (key, value) pairs over (millions) of nodes (peers)
- Any node can perform a query with a key – with a small number of messages
- Pairs are evenly distributed over nodes
- Each node only knows about a small number of other nodes
- Nodes are continuously coming and going (churn)
- Rule: assign the key-value pair to the node that has the closest id
Exercise: We will get acquainted with the DHT in python. A DHT is typically deployed over a decentralized system architecture (peer-to-peer). By sorting the collection of nodes into a ring, the main purpose of a DHT is to facilitate the location of nodes (containing data) giving a key value (lookup). The DHT is built by taking the p2p network as a reference. In this p2p network, each node can be the client or the server. Nodes in the network are the "same" (same application or protocol). Nodes are connected with other nodes and each node provides a TCP/IP server on a specific port for other nodes to connect. The code for part of the implementation is given here for you (code).
- The file input-file.txt describes the DHT elements.
- key-space defines the range of values that the ring stores.
- nodes describe the list of actual nodes in the ring storing key data. Nodes are assigned with values within the #key-space. If nodes with values outside that range are introduced, then the nodes are discarded and a warning message should be triggered, e.g., 110 is not a valid node. At least one node should be defined in the file. If there is just a single node, this node has a reference to itself.
- shortcuts describe direct connections between nodes that can be used to reduce lookup time.
- Notice that each node has two references by default – successor and next successor. Remember also that each successor node stores all the key values before it, for instance, node 17 stores key values in the range of 6 to 17). Nodes just send requests to successor nodes (neighbors) or nodes in their finger table (if any).
- As you can see in the code repository, there are three files - main.py, node.py and nodeconnection.py. In nodeconnection.py, The class NodeConnection represents connections between two nodes. With this, a node can send a message to successor and the successor can respond if needed. In node.py, the Node class represents a node in DHT. In main.py, it loads the input file, initials the ring and exexcute the commands.
Working Pipeline
$ python main.py input-file.txt
- List: List the nodes in the ring (sorted from lower to higher). It also shows the references that each node is storing. (: Shortcuts, S: successor, NS: next successor)
Task: Based on the exercise, Please implement the DHT with the following commands:
- Join: this commands allows a new node to join the ring. Notice that successor and next successor references should be updated.
- $ join 7: its successor 17 and next successor is 22
- Shortcut: this command adds a shortcut from one node to another.
- $ shortcut 7:22
Deliverables: please contain the source python file, and also the screenshot of results in terminals. Then put all the file in a single zip.
Link:
NOTE: Please watch the video or ask us through Slack if something is not clear for you. We usually don't change the points after the deadline.