HW4. Skip-Lists, BST, Trie (27th of September, 23.59)
There are many sources for numeric data inputs available. For example, the Fibonacci numbers, squares, etc. that may make your life easier when you need to generate data. For example, OEIS (https://oeis.org/) is a source of various integer sequences, you can grab from there many sequences of numbers :) e.g. https://oeis.org/A192066. We can use those as data, e.g. grab the file https://oeis.org/A192066/b192066.txt
Another computer-generated sequence is the “3x+1 sequence”. Start from any x=n. If x is odd, the next value is 3x+1. If x is even, the next value in the sequence is x/2. Sequences get into infinite loop once x=4 -> 4,2,1,4,2,1… Hence, you can stop when the value x is 1. Python example is in here https://colab.research.google.com/drive/19IppnAmdjLbyKcC9lOnSnqf7JYi-uKE4#scrollTo=eglfdZ-x23uS If you are intrigued, see Veritasium video or Collatz conjecture or The Ultimate Challenge: The 3x+1 Problem by Jeffrey C. Lagarias, etc.
Exercise 1 - Skip List
Draw Skip-list: Build a dictionary (with no repeated values) of the 3x+1 sequence from
collatz_sequence(7) => [7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1].
Insert those numbers into the Skip-List. What the list would look like at the end? Your task involves “coin tossing”, so all lists should end up slightly different :) Which values are “towering highest” at the top 3 levels?
Note: the task does not require implementation, as a minimum, a simulation with pen and paper is sufficient with final outcome (and maybe some intermediate steps in between) is needed. Note, that it is can be even more efficient and definitely more educational to perform this actually on a clean sheet of paper, rather than clicking on the user interface of various simulators available in Internet.
Exercise 2 - Binary Search Tree
Draw BST: Insert the same values (from EX1) into the Binary Search Tree (in the same order, no balancing). Do you benefit from the drawn BST in lookup for uniqueness (checking if an element is unique/does not exist in the BST)? How balanced is the final outcome?
- Add AVL balancing information to each node (height differences for left and right subtree)
- Is the tree AVL-balanced? If it is not, can you make it “AVL-balanced”? (some minimal post processing, not full simulation)
Note: Performing this task on paper is likely the most efficient use of your time.
Exercise 3 - Trie
Draw a Trie data structure for D={ word, world, armor, amour, amer, armour, arm, warm }. Number each node in the order of its creation.
The same note: Pen and paper are your friends :)
Exercise 4 - Trie passing simulation with BFS, DFS
From that word Trie - pass the resulting trie in Breadth-first and Depth-first orders, always following the alphabetic order. Write down the sequence as a two-row array - the node number at the top and the letter that got you to that node, at the bottom. For Depth-first pass show all three orders: pre-order, in-order and post-order. In case of in-order and post-order no need for letter (that is only required in pre-order).
Note: Coding not required.
Exercise 5 - Design of the Trie if it would be implemented in practice
Discuss how you would implement the branching in the Trie nodes of task 2 in case of binary alphabet, English alphabet, or full Unicode branching (e.g. an array, list, etc). Discuss how the alphabet size affects the implementation. Estimate roughly the actual size of the Trie in memory if you would use the full list of words from HW 2 (Attach:words_random.txt) with your potential implementation.
Note: Unicode refers to a "very large alphabet". It may be many thousands of alternative characters. It could be also an "integer alphabet" with tens of thousands of potential values (characters) in strings.
Exercise 6 - Bonus (1p)
Simulate the Red-Black balanced binary search tree generation for the sequence collatz_sequence(9). Show a few intermediate steps and the final version. Again, drawing with pencil on a paper is likely the most efficient approach.
Exercise 7 - Bonus (2p)
Beat the built-in sort() with some linear time sort of your own. Use a sequence of 20-bit integers within “int”. Use as many values as you need from the sequence of Fibonacci numbers: 1,1,2,3,5,8, … (always the next is the sum of two previous ones), in modulo 2**20 (the last 20 bits only).
Exercise 8 - Bonus (1p)
Measure the actual space consumption of your implementation of Task 7 comparing the python list vs array, in case of 10M values.