HW4. Trees (30th of September, 23.59)
Example data generator making a tree
We provide a simple framework for creating trees represented within a table (array, list). Each node has in this case 4 values - the info, left and right child, and the color. Node 0 can act as book-keepiing for the tree. And tree[0][ LEFT ] would be the root index of the tree.
We use Collatz sequence for generating numbers to be inserted into the binary search tree. Collatz sequence from positive integers: collatz(n) = if n mod 2 == 1 : return 3*n + 1 else return n/2. I.e. even integers are divided by two and odd integers are multiplied by 3 and add 1. This collapses for all known cases so far into 1, as 1->4->2->1-> is an infinite loop. Unfortunately, there is no proof or knowledge if there exist other loops or if it will start growing infinitely from some n, instead of collapsing (i.e. there is no proof if the program will ever finish work, but so far, for known cases, it does).
See the code example:
https://colab.research.google.com/drive/1aYZY5im5G9mONMN_T4xCn4gqNdvPgNPa#scrollTo=e8udV1BWB3tg
EX1.(1p) Traverse the binary tree for a basic info
For the BST-s that are created from the Collatz sequences from 7, 9, and 51, print the tree, followed by the calculation of the
- tree size
- tree depth (length of maximally long path)
- tree width at each depth and sum of all values at that depth
- pre-, in-, and postorder of the depth -first search (print all values on single line only)
- a parenthesised serialized representation of the depth-first pass through the tree
- breadth-first order of the values in the tree.
Hint: you can re-use the same single tree-passing fu8nction by adding to it requested info by just collecting respective info as needed and or collecting multiple outputs by appending to respective strings or lists. Alternatively, create a small nr of separate tree-passing functions with dedicated goals for each separately. Write in the report why you chose one over the other approach.
EX2.(1p) Binary Tree Maximum Path Sum
A path in a binary tree is a sequence of nodes where each pair of adjacent nodes in the sequence has an edge connecting them. A node can only appear in the sequence at most once. Note that the path does not need to pass through the root. The path sum of a path is the sum of the node's values in the path.
Based on the above examples of binary search trees constructed from the Collatz sequences from 7, 9, and 51, calculate the maximal path sums for each tree. Print the path from the left path leaf through the identified node (highlight that node, e.g. by adding parentheses. E.g. max = 75 = 8, 10, (11), 17, 13, 16. In this example 8 and 16 are path ends (leaves) and 11 is their common lowest ancestor.
Would your code run in a generic case, when the tree is not a original binary search tree but some binary tree with a mix of positive and negative values? Argue why would it work or not? Can you provide some example easily, where you generate such a mixed tree and show the outputs as previously (make sure to have enough negative values and that the actual answer has both positive as well as negative values? (One possibility is to modify the original BST to use only absolute values for navigation, but store actually negative values, e.g. some collatz sequences in BST to be positive values, and the other(s) as negative values.)
EX3.(1p) Create a red-black tree.
The binary search trees based on Collatz sequences from 7, 9, 51, are not balanced, as all tree nodes are simply red in the example code. Your task is to convert these three trees into balanced ones. Choose yourself one of the three possible strategies (or provide some additional alternative):
- Make the tree first independently as balanced as possible, assign new color codes only afterwards to reflect the balancedness
- Propose a strategy to balance such a tree by a systematic rebalancing process by bringing the tree into red-black tree by continuously resolving conflicts you observe, one at a time.
- Modify the tree node insertion commands and keep the tree as a balanced red-black tree throughout the process during the entire construction
Argue briefly the benefits and drawbacks of your choice of method (2-3 in favor, 2-3 against).
Provide in the report the original tree + the red-black balanced one next to each other.
Provide information on how much effort (coding itself + running the code). E.g. how many traversals, color changes, rotations or other extra work was needed? If you did not code but used pen and paper, state the estimate of the effort.
EX4.(1p) Construction of a small B-tree based on the Collatz sequences
Create three small B-trees from Collatz sequences for 7, 9, and 51 with up to 4 values in each node (up to 5 children). I.e. normal nodes having 2, 3, or 4 values (only root can possibly have 1 value).
You can do it manually on paper; or use code generation to assist in B-tree generation. Make sure to generate compact concise full code. If you decide to code, make sure to report the tree construction through a couple of intermediate steps as well. If you do it in code, merge at the end the three trees into a single larger tree, and report that as well. Avoid multiple occurrences of the same keys. How big are the trees (nr of nodes, depth of tree)
EX5.(1p) Construction of a TRIE based on the Collatz sequences
Let’s again use the Collatz sequences as a data source, e.g. from 51. Treat these integers as strings over alphabet of 10 letters (‘0’..’9’). Create a TRIE data structure based on these values. First, discuss some implementation details - how would you implement it? Secondly, characterize the final tree itself. The size, depth, etc.
Discuss some benefits or drawbacks that we would get from such TRIE representation of the data. Create 4-5 pros and cons.
EX6. Bonus (1p) TRIE continued
Continue from EX5. Insert into Trie more data. Either many Collatz sequences, or entirely random integers. Use the trie based on keys textual representation to store such integer data. Make sure the stored “integers” are the “same length” and that you could use such a trie to make a range query, e.g. fetch all data that are in between 1023 … and 7895 in alphabetid (or numeric) order(for positive integers).
Does your code work for smaller alphabets or large alphabets (beyond 0..9, like the ASCII or entire Unicode (or integer) character sets?). Discuss merits and shortcomings of your implementation.