HW2. Growth of functions, recursion, binary search (18th of Sep, 23.59)
EX1 (1p). Fin the asymptotic bounds (as explained in the lecture)
- Find a simple, tight asymptotic bound for (i) (Cn6006) (ii) log6006( (log(n{$ \sqrt{n} $}))2 )
- Show that 2n+1 ∈ Θ(2n)
EX2 (1p + 1p) . Function ordering.
Part 1:
Rank the following functions by their order of growth, that is, find an arrangement f1, f2, …, fn of the functions satisfying f1 = Ω(f2), f2 = Ω(f3), etc. If some functions satisfy f1 = Θ(f2), then mark it as well.
The functions are: 2lg*n, n3, lg*n, en, n, n2, lg(n!), 2n, n!, 22^n, ln(n), (n + 1)!, nlg(n), 1, 22^(n+1). (Note: since we couldn't insert triple layer powers in here, we used ^ symbol for that. Also lg* is called iterated logarithm so treat it accordingly)
Try to briefly explain how you made your decisions. You don't have too specific. Also, you might want to look up Stirling formula. Lecture slides are your friend!
Part 2:
Use Gnuplot or other plotting library (e.g. matplotlib if you use Python, or ggplot2 in R) to visually compare growth rates of functions (at least 8).
The functions are: 2lg*n, n3, lg*n, e n, n, n2, lg(n!), 2n, n!, 22^n, ln(n), (n+1)!, nlg(n), 1, 22^(n+1). (Note: since we couldn't insert triple layer powers in here, we used ^ symbol for that. Also lg* is called iterated logarithm so treat it accordingly).
If possible let all functions have their own colour and type (dotted line, dashed, etc.), also add color legend. Provide meaningful names for both axes. How can you avoid some functions to flatten out on the plot? Can you see the same ordering as before?
EX3 (1p). You are given two algorithms with the following recurrent function A and B:
- A(N) = A(N/2) + 1
- B(N) = 2B(N/2) + N
Part 1: Draw a recursion tree for recurrences (feel free to do it in paint, paper (photo), any other fancy tools) and include it in your report. What are the heights of these recursion trees? Do you recognise these recurrences (which algorithm?)? What complexity class (Big O) are these algorithms (try to justify, remember lecture!)?
Part 2: The next task is to estimate the number of operations. Given the formulas, write a program to estimate the number of operations given above complexities (e.g. write a function to calculate T(50), T(100), T(N)). Plot a graph showing the relation between T(N) and N.
Note, you can use T(1) = 1.
EX4 (1p).
As we have all heard multiple times by now, binary search is an amazing algorithm on ordered/sorted data. It works by halving search space at every step, finding a solution in O(log(D)), where D is the size of the search space. In this exercise, we will implement a binary search algorithm for lexicographically sorted words.
First, download the sorted database of words we will be using at words.txt. The origin of words is "words is a standard file on Unix and Unix-like operating systems".
Given the words dictionary, your task is to:
- Read in the words from a file into a suitable data structure (e.g. array of strings);
- Implement a linear search algorithm. Given a word, return the index i of the word, if it exists in the word database. Return something different, e.g. -1 or None, if the word was not found. Do this by scanning through the whole word list starting from the beginning. And return the index i (location) when you see the word;
- Implement a binary search algorithm. Given a word, return the index i (location) of the word, if it exists in the word database. Return something different, e.g. -1/None, if the word was not found;
- Demonstrate that your implementations work! Try searching for existing words and try to search for words that do not exist.
Implementation tips:
Hint 1: Start looking at the string on index N//2, where N is the number of strings. Check whether the word is lexicographically larger or smaller. Depending on whether the word was smaller or larger, either continue your search in the left half or the right half.
Hint 2: Many programming languages have built-in support for lexicographical comparison of strings, e.g. you should be able to do “aaa” < “aab”.
EX5 (1p).
This is a continuation from the previous exercise. Let’s devise an experiment to measure how much faster is our binary search algorithm compared to the linear one. For that you can:
Randomly select a number words from the file (do this several times). Also, select some word that is not in the word database (you can make a word up).
For each word try and measure:
- Time on how long it takes for both algorithms to find (not find) that word;
- The number of comparisons to find (not find) that word (e.g. in binary search, how many time we do target_word < comparison_word).
Repeat the experiment several times and average the results. Can you see some dependence on the word location index “i” in the file and the time/number of comparisons it took to find it?
As always, plot the results and try to explain what you have found! Which algorithm is faster?
EX6.(bonus 2p) Frequentest
The following two Python functions correctly solve the problem: given an array X of n positive integers, where the maximum integer in X is k, return the integer that appears the most times in X. Assume: a Python list is implemented using a dynamic array; a Python dict is implemented using a hash table which randomly chooses hash functions from a universal hash family; and max(X) returns the maximum integer in array X in worst-case O(|X|) time. For each function, state its worst-case and expected running times in terms of n and k.
def frequentest_a(X):
k = max(X) H = {} for x in X: H[x] = 0 best = X[0] for x in X: H[x] += 1 if H[x] > H[best]: best = x return best
def frequentest_b(X):
k = max(X) A = [] for i in range(k + 1): A.append(0) best = X[0] for x in X: A[x] += 1 if A[x] > A[best]: best = x return best
EX7. (bonus 2p) (a) Let's play a game. Your goal is to correctly guess the number I'm thinking of. The number is an integer between 0 and n-1, where n is a power of 2. Every time you try to guess the number, I'll tell you whether you're too low or too high, or if you found it. How long would you take to guess the hidden number, in the worst case? Describe a better strategy for playing this game, and analyse it. Let's now say that your brain is too slow to make divisions and takes, in average, 3.7 seconds to give the result of a division. For any other operation, it's super fast and takes 11 milliseconds, in average. How fast can you be now at guessing the number, when comparing both strategies?
(b) What about ternary search (3 divisions of the search space), or any higher order search algorithm? Do they work better than binary search, in terms of asymptotic complexity, number of comparisons made, overall complexity of the algorithm ...? Try to find out by implementing the ternary search, and compare it against your binary search implementation.