HW2. Growth of functions, recursion, binary search (13th of Sep, 23.59)
EX1
You are given the following functions:
Rank the above functions by order of growth. Try to briefly explain how you made your decisions. You also might want to look up the Stirling formula.
Hint:
- Solve the simple summations first (if needed, convert them first into a table 1*n or n*n explicitly)
- Lecture slides are your friend!
EX2
- Visually compare growth rates of functions from Task 1. Make sure to have names on axes. Can you see the same ordering of functions as your proposed order? How can you avoid the plot from flattening out (i.e. not being able to see the growth of some functions)?
- Multiply function 6 with 0.00001 (as if executed on 100,000 times faster computer, making the growth of the algorithm timing 100,000 times slower) and compare it to function 5. Plot them again and demonstrate "which is faster" in asymptotic cases.
For tools, you can use any plotting library (e.g. matplotlib (or other) for Python, ggplot2 in R, etc..) or for example Gnuplot Gnuplot tutorial.
Hint:
- You do not necessarily have to put all functions on one plot.
EX3
You are given two following recurrent functions A and B:
- A(N) = A(N//2) + 1
- B(N) = 2B(N//2) + N
where A(1) = B(1) = 1 and // is integer division
Part 1: Draw recursion trees for the above recurrences (feel free to do it in paint, paper (photo), any other fancy tools) and include it in your report. What are the heights of these recursion trees? Do you know any algorithms which belong to the same complexity class as the given recurrences? What complexity class (Big O) are these recurrences (try to justify!)?
Part 2: Write a program to calculate the values of the above recurrences given N. Plot a graph showing the relation between A(N) and N and B(N) and N.
EX4
As we have all heard multiple times by now, binary search is an amazing algorithm for finding the location of something in ordered/sorted data. It works by halving search space at every step, finding a solution in O(log(D)), where D is the size of the search space. In this exercise, we will implement a binary search algorithm for lexicographically sorted words.
First, download the sorted database of words we will be using at Attach:words_alpha_new.txt or its random order version Attach:words_random.txt.
Given the sorted database of words, your task is to find the index (location) of the word, if it exists in the word database. Return something else, e.g. -1 or None, if the word does not exist within the database.
For example, if our words list was ['banana', 'cat', 'dog'], then if we would search for: 'banana', our answer is 0, if we search for 'apple', then the answer is -1.
Implement the following two algorithms:
- A linear search algorithm. Do this by moving through the whole word list starting from the beginning and return the index i (location) when you see the word;
- A binary search algorithm to perform searching more efficiently.
- Demonstrate that your implementations work! Try searching for existing words and try to search for words that do not exist. Do both algorithms give the same answer?
Implementation tips:
Hint 1: For binary search, start looking at the string on index N//2, where N is the number of strings. Check whether the word is lexicographically larger or smaller. Depending on whether the word was smaller or larger, either continue your search in the left half or the right half.
Hint 2: Many programming languages have built-in support for lexicographical comparison of strings, e.g. you should be able to do “aaa” < “aab”.
EX5
This is a continuation of the previous exercise. Let’s devise an experiment to compare the performance of the linear search and binary search. For that you can:
Randomly select a number of words from the file (do this many times). Also, select some word that is not in the word database (you can make a word up), it has to return -1/None/...
For each word try and measure:
- Time on how long it takes for both algorithms to find (not find) that word;
- The number of comparisons to find (not find) that word (e.g. in binary search, how many times we do target_word < comparison_word).
Repeat the experiment several times and average the results. Can you see some dependence on the word location index “i” in the file and the time/number of comparisons it took to find it, i.e. as the word index in the sequence is larger, does the time it takes to find it change?
As always, plot the results and try to explain what you have found! Which algorithm is faster? How much faster?
EX6 (bonus 2p)
Let's keep analysing binary search to build up more intuition on algorithm performance analysis. Note that you can always do these kinds of analyses for any kind of algorithm you are considering using in any of your projects.
Part 1: Consider the previous data in sorted order again. Use binary search to search for words that do not belong to the database. Find out how many searches you can perform in 1 minute using binary search and using linear search.
Part 2: Now let's consider the following question of trade-offs. In the previous task, the data was already given to you in sorted order. Now your task is to first shuffle/scramble the data to be in random order and then to try and answer the following question.
- When does it become worth it to first sort the data and allow binary searching vs just using linear searches to search?
Note: the part 2 question is left rather open intentionally to see how you would perform the analysis.
EX7 (bonus 1p)
Compare your approach of when it is beneficial to first sort and then search with binary search to the UNIX search tool grep performance on unsorted data. When does it get beneficial to not sort but rather just use grep on the same data file?
For extra karma - how fast is grep when searching many keywords simultaneously? (see parameter -F from https://linux.die.net/man/1/grep )