Algoritmika - Kursused - Arvutiteaduse instituut

HW4. Trees (29th of September, 23.59)

Example data generator making a tree

We provide a simple framework for creating trees represented within a table (array, list). Each node has in this case 4 values - the info, left and right child, and the color. Node 0 acts as book-keeping for the tree and value 0 is like a NULL pointer. The root is located at the index tree[0][LEFT].

We use Collatz sequence for generating numbers to be inserted into the binary search tree. Collatz sequence from positive integers: collatz(n) = if n mod 2 == 1 : return 3*n + 1 else return n/2. I.e. even integers are divided by two and odd integers are multiplied by 3 and add 1. This collapses for all known cases so far into 1, as 1->4->2->1-> is an infinite loop. Unfortunately, there is no proof or knowledge if there exist other loops or if it will start growing infinitely from some n, instead of collapsing (i.e. there is no proof if the program will ever finish work, but so far, for known cases, it does).

See the code example:
https://colab.research.google.com/drive/1aYZY5im5G9mONMN_T4xCn4gqNdvPgNPa#scrollTo=e8udV1BWB3tg

T1.(1p) Traverse the binary tree for a basic info (code)

For the BST-s that are created from the three Collatz sequences starting from 7, 9, and 51. Print the tree, followed by the calculation of the

tree size
tree depth (length of maximally long path)
tree width at each depth and sum of all values at that depth
pre-, in-, and postorder of the depth -first search (print all values on single line only)
a parenthesised serialized representation of the depth-first pass through the tree
breadth-first order of the values in the tree.

Collect as much info during a single traversing order, as reasonable. Which data is easier to collect in other search orders.

T2.(1p) Binary Tree Maximum Path Sum (code)

A path in a binary tree is a sequence of nodes where each pair of adjacent nodes in the sequence has an edge connecting them. A node can only appear in the sequence at most once. Note that the path does not need to pass through the root. The path sum of a path is the sum of the node's values in the path.

Based on the above examples of binary search trees constructed from the Collatz sequences from 7, 9, and 51, calculate the maximal path sums for each tree. Print the path from the left path leaf through the identified node (highlight that node, e.g. by adding parentheses. E.g. max = 75 = 8, 10, (11), 17, 13, 16. In this example 8 and 16 are path ends (leaves) and 11 is their common lowest ancestor.

T3.(1p) Create a red-black tree (manual or code).

The binary search trees based on Collatz sequences from 7, 9, 51, are not well balanced. Your task is to convert these three trees into balanced ones. Choose one of the three possible strategies (or provide some additional alternative).

Start from the input sequence and create a balanced R-B tree for that sequence alone. Report the final version that corresponds to a R-B tree (include colors in the output).
Propose an automated strategy to balance a R-B tree that has color-conflicts by some systematic rebalancing process. Apply that process and bring the tree into a correct red-black state by continuously resolving conflicts you observe.
Modify the tree node insertion commands and keep the tree as a balanced red-black tree throughout the process during the entire construction

Provide in the report the original tree + the red-black balanced one next to each other.
Argue briefly the merits and shortcomings of your choice of method as compared to others.

T4.(1p) How to generate example data and experimental setup to assess dictionary data structures?

In order to assess the performance of the dictionary, one would need to simulate the operations under some realistic assumptions and tough biases and edge cases in the data. We used Collatz sequences, that is likely not the best for testing the scalability of the methods. What is better for real experimental setup?

Outline the criteria for a thorough data and test generator. Make sure that it is scalable, i.e. you can determine how large the dictionary will grow, and how the insertions, deletions are distributed. Note that entirely random data does usually not reveal the potential issues as compared to having systematic biases in data. For example, the stretches of increasing values, that could cause unbalanced paths in BST unless rigorously balanced, etc. Also, the deletions would require that usually the data actually exists before it is deleted. It may be trickier or even impossible to know if you systematically delete from top of BST or leaves. In case of a proper ADT of a dictionary, you can not even know how the code is implemented.

Implement your synthetic test generator based on your ideas and measure (some) library with that. Your task is to think as a tester or quality assurance manager. Generate large test sequences and measure individual operations as well as many operations in a row. Include in your test data breakpoints and commands about what should be measured/timed. E.g. runs of (tens) of thousands of operations.

Measure the performance of some existing library with that generator for individual operations across several orders of magnitude. Can you detect any potential expected biases? Hint: it should be easy to notice performance while the dictionary still holds fewer elements, and increase in time when the dictionary grows. But can you create changes in performance by choosing data systematically "badly"?

Make sure that when you provide the report you refer to your test ideas and provide the results in ways understandable for human reader.

T5.(1p) How to measure the balancedness of a BST?

Quantify the tree structure. Given a BST, you can calculate various informative characteristics of that BST. Provide some explicit quantitative measures that would characterise the state of BST in terms of balancedness. E.g. the maximal vs shortest path, the distribution of path lengths, etc.

Calculate these characteristics on the T1 and T3 examples of BST-s.
Use your test generator from T4 and make some larger BST-s, run your BST balancedness assessment tool on larger BST.

Identify and argue which specific measure(s) that you calculated should correlate the best with the actual timing performance measurements (relate them to test cases from T4). Here you do not need to perform the actual timing measurements, only identify the measures for the tree that you can calculate based on a specific tree.

T6.(2p) Conduct an experiment combining T4 and T5 on an explicit BST implementation

Validate that your own experimental data generator and performance measurement code, as well as the tree balancedness checker from T5 work as expected and provide interpretable and correlated/coherent measures.

Note that the BST code from T1 example does not balance the tree nor does it support the deletions. You should easily find test cases that confirm poor performance of such BST. E.g. first generate base BST by random numbers; and then introduce biases, then quantify the tree and measure the time.

Feel free to measure any BST code of your liking. If not otherwise, tamper with the implementation to introduce problems in order to demonstrate that you could identify them with your experimental setup.

Algoritmika 2025/26 sügis

HW4. Trees (29th of September, 23.59)