Algoritmika - Kursused - Arvutiteaduse instituut

HW3. Recurrence, Sorting and Skip List (23rd of Sep, 23.59)

EX1.(1p) Master Theorem

Solve recurrences by Master Theorem. Describe their complexity class Theta( )

T(n) = 4T(n/2) + n log(n)
* T(n) = 6T(n/3) + n
* T(n) = 6T(n/3) + n log(n)
* T(n) = 6T(n/3) + n*n
* T(n) = 9T(n/3) + n*n
* T(n) = 2T(⅔ n) + n

EX2. (1p) Automate solving similar recurrences using a computational approach

The following code generates random example recurrences, calculates them numerically by recursive simulation, and plots results.
Example code generates random a, b, f(n) choices.
https://colab.research.google.com/drive/10o2uJjddfvF8FW2f7_AvveQRLVhzP3hA

a,b - random values; f(n) ∈ { n , n log(n), n**2 , n**2.5, n**3, n**4 }
If n < 10 : T( n ) = n
else : T( n ) = a * T( n/b) + f(n)

First, familiarize yourself with the code.

Assess the benefit of memoization used in the code (here is an updated code file with example). Find examples where the effect of memoization is significant (like, at least 10x faster). Provide example T(n) and measurements of time with specific parameters n, with and without memoization.
Identify and fix a set of five interesting combinations of the a,b, f(n) in order to answer the following questions by plotting the chosen combinations.
Hint: Find cases where the growth of different T(n) functions visually clearly crosses on the plots). Provide a few such examples, where the f(n) component is “overshadowed” by the recurrence. i.e. the T(n) with lower f(n) class examples growing faster than more complex f(n) class component in the T( n ) = a * T( n/b) + f(n) ( e.g. recurrence with f(n)=n**2 is growing faster than recurrence with f(n)=n**3 ).
Explain why and when this happens. Characterize the respective T(n) Theta(n) class

EX3.(1p) More complex interdependent recurrences:

Perform a similar computational analysis for five random cases of a mutually dependent T1(n), T2(n), T3(n) based on the previous Task example (use the same or similar ranges for a,b,f as in the previous task). In each combination you would have three different ai, bi, fi. Select them randomly, but make them mutually different from each other (each ai is different, each bi and fi is different).
If n < 10 : T1(n) = T2(n) = T3(n) = n
T1 (n) = a1 * T2( n/b1 ) + f1(n)
T2 (n) = a2 * T3( n/b2 ) + f2(n)
T3 (n) = a3 * T1( n/b3 ) + f3(n)

Report the generated five combinations of T1, T2, T3 three-way recurrences by plotting their functions on a single plot (similar to Task 1).
How to determine the Theta() class in such three-way dependent recurrence? Select one such example, where you could possibly achieve a conclusive answer. Start from an example, where all ai, bi, fi are equal to each other. Then, change only one of the functions.
Find one or two examples where you think you have identified the correct overall complexity class when each Ti is clearly different from each other (different ai, bi, fi). Provide the plot with calculated values and most likely Theta(f(n)) class for selected functions.

Hint: if analysis with three functions Ti (T1, T2, T3) is too challenging, simplify it for the case of two T() only: T1(n) = a1* T2( n/b1 ) + f1(n) ; and T2(n) = a2* T1( n/b2 ).

EX4 (1p) Bucket Sort

Divide 10 000 points that are randomly uniformly distributed within the 3D sphere into “buckets” by the distance from origin. The Sphere is centered at the origin (0,0,0). Data generator:
https://colab.research.google.com/drive/11_cMfMNJYfWrJM9kNIU9aRje-YwfMHVb#scrollTo=PpVVXrF_bpmW

Experiment with pre-calculated boundaries for buckets. First, split data into ten buckets by providing the boundaries. Count the nr of points in each bucket. Next, split each bucket further into 5 using the same principle. Summarize the obtained buckets by a count of points in each 10x5 sub-buckets. Compare two approaches: a) equi-distant from origin, and b) expected “equi-size” buckets. See, that your results actually match your expectations.

Smth like: Bucket 1 [0 … 120) 1100 : 190 210 200 210 200 Bucket 2 [120 .. 190) 990 : 200 190 200 .. …
(here numeric boundaries are only for top-10 split; for next split into 5 only the size of bucket is given)

Implement a bucket-sorting algorithm for this type of data by splitting data recursively into roughly equal-size buckets and relying on a simpler method if the bucket has 10 or less items.
What might be a good approach for determining the number of buckets to be used? From which bucket size should you rely on a simpler sorting method?
Measure experimentally (feel free to increase the data size from the generator).
Make a conclusion: how good is your Bucket sort compared to Quicksort for this particular type of data?

EX5 (1p) Skip List – Design a Skiplist without using any built-in libraries Implement the Skiplist class: Skiplist() Initializes the object of the skiplist. bool search(int target) Returns true if the integer target exists in the Skiplist or false otherwise. void add(int num) Inserts the value num into the SkipList. bool erase(int num) Removes the value num from the Skiplist and returns true. If num does not exist in the Skiplist, do nothing and return false. If there exist multiple num values, removing any one of them is fine.

Measure the speed of your implementation (in some meaningful way)
Calculate the memory consumption of your implementation of skip-list (how many pointers relative to values n?)
Vary how your skip-list class uses the probability to elevate an element with probability 1/x , e.g. test x in { 2, 4, 6, 8 }
Devise an experiment to compare the effect of choice of x - size and speed. Which x is the “best”?

ChatGPT gives away a basic implementation as in here. Feel free to modify it and answer the questions a-d.
https://colab.research.google.com/drive/1NONoUW1wKxf8kgqcv62nvJ50EitZ7rZv#scrollTo=Zf_NGT5QlH5_

EX6 Bonus (2p) Research continuing from Task 3 Try to automate determining the Theta( g(n) ) class for each of the five T1(n) random functions (automatically). Hint: first work on some simple well-known f(n) or T(n) for which you already know the answer! Only after that go for more complex T1(n) functions from Task 2, task 3. Our main goal is to get “definite” answers for Task 3. This is a research question that may not have conclusive answers; approach with care.

First, describe a general strategy and pseudocode to automatically identify one function from the possible space of functions that define separate complexity classes. Only after that implement it for simpler functions, and only then move to more complex functions.

For each g(n) in expected function classes g(n) ∈ { n , n log(n), n**2 , n**2.5, n**3, n**4 } or possibly more, find the smallest c1, s.t. c1* g(n) > T1(n) for n=2**13 and the gap to T1(n) increases for even larger n, for example n= 2*14, 2**15.
For the same g(n) find the largest c2, s.t. c2 * g(n) < T1(n) for n=2**13 and the gap to T(n) increases for even larger n= 2*14, 2**15 .
Determine whether g(n) might represent the Theta class for T1(n), s.t. T1(n) = Theta(g(n)) (you may try to automate; as well as provide analytic answer); First try it for some complexity function T(n) ( or f(n) ) for which you can determine the correct solution analytically.
Make a summary table: For each 5 functions T1(n) output a table: chosen g(n) and values: n, c1, c1*g(n), (c1*g(n)-T1(n)) , T1(n) , (T1(n) - c2*g(n)), c2*g(n), c2. Choose the same n as you used in your code, for example n ∈ { 2**11, 2**12, 2**13, 2**14, 2**15 } or n ∈ { 10**5, 10**6, 10**7, 10**8, 10**9 }. As there are five n, make these as columns; and the other values as rows.
Interpret, whether the Theta class was appropriately determined in each 5 cases? And whether the identified c1, c2 would “prove” the complexity class by such inspection.

Hint: c1, c2 could be perhaps obtained by division of T1(n) / f(n) or by some other search strategy (e.g. binary search?). Stick to integer constants c1, c2 (so you can also enumerate or perform search more easily, no need for fractional c-s)

Hint: You may need to significantly increase the sizes of n (for larger n the differences should be more clear). I.e. powers of 10? 10**5, 10**6, 10**10 … (python should handle large integers without overflows)

Hint: You may need to also use more f(n) functions to compare: f(n) ∈ { n , n log(n), n**2 , n**2.5, n**3, n**4 }

Algoritmika 2024/25 sügis

HW3. Recurrence, Sorting and Skip List (23rd of Sep, 23.59)