HW1. Introduction (05.09, 23:59)
Reporting is important, let's practice to make your homework attractive:
If you are using Python, one choice is to use Jupyter Notebook which makes it easy to produce either HTML or PDF file as a report (and submit the notebook .ipynb itself as the code). If you are not familiar with Jupyter Notebooks follow the instructions for setup and usage from here.
If you use some other programming language, you can use this Overleaf template (in LaTex) to produce a PDF report for this homework and submit your code as usual. You can use a desktop LaTex editor like TeXmaker or come up with your own template similar to the one provided. At the very least, you can use Word to write up your homework, but make sure to submit your document as a pdf.
In the course, you have to make plots/figures in order to communicate your results. If you have not made scientific plots before, please check this website for basic on how to make scientific graphs: Communicating results with scientific graphs. At the very least, make sure your figures have: title, names for axes, units. Use x-axis as cause and y-axis as effect (one causes the other, e.g. size of the array on X and time taken on Y).
For an example of sample submission using Jupyter notebook, check the demo solution.
EX1
Familiarise yourself with the course and do all the necessary administrative things. Read all the materials on the courses page, definitely:
To get a point for this task, make sure to read the above links and write in your homework document what do you need to pay attention to when submitting your homework.
Make sure to also have joined the course's Slack channel, as we will use it to communicate course information, homework feedback, and more!
EX2 and EX3
In the study of algorithms, we are often interested in the performance of the implementations of our algorithms and data structures.
The first real exercise in this course helps you to set up and get acquainted with the experimentation framework as we will be using the same formula throughout many homework exercises. We will set up the generation of random data, benchmarking, and plotting, which you can potentially use in future homework tasks.
1) Find out how to generate random integers/numbers in your choice of a programming language and write a reusable code to generate a list/array of N integers in a given range (e.g. between -100 and 100).
2) Figure out how to perform a timing experiment - how long does it take to run your code. One simple way to do it would be to find time before code execution and then after execution. The time taken is then the difference between those two times. For Python, check out the following article: Python Timer, but of course, you can also implement this your own way.
3) Measure the time it takes to execute the following code as the size of the input increases. The code for Python is provided. If you want to use some other programming language, feel free to rewrite it, it should be rather simple. The code just takes an input of a list of numbers and sums up odd integers within the list.
def calculate_odd_sum(series: list) -> int: """ Calculate sum of odd integers within the list """ total_sum = 0 for val in series: if val % 2 != 0: total_sum += val return total_sum
To complete this task, generate lists of increasing size, e.g., use powers 2,4,...2**(25) and run the above code and measure time taken. For each size, run the experiment 5 times (every time generating a new input list! Why?) and calculate the average time it took for each input size.
4) Analyse runtime variability at one fixed dataset size. E.g. choose n = 500000 and run the above code many times (for every run generate a new list/dataset). Does it take a different time to execute on the same input? Why? Write down some reasons, why does the algorithm take a different amount of time to execute on the same size input size?
5) Create plots showing results of experiments from steps 3 and step 4. Which plot should you use? Make sure to add labels to your axes and a title to your plots. Interpret what you see in the plots.
So in the end:
- Create a way to create generate inputs to an algorithm;
- Figure out how to time parts of code;
- Measure time for given algorithm;
- Analyse runtime variability;
- Create plots.
EX4
Measure a sorting algorithm that is part of your programming language's standard library. Briefly describe the algorithm and state its computational complexity (you might need to dig a bit of documentation for it). Follow similar steps to analyse the sorting algorithm as you did in EX 2/3 (no need to analyse variability (step 4) this time).
Try to overlay some function T(x) (relation between input array size and output time, e.g. T(x) = 2*x (seconds), where x is input list's length) that would approximate the built-in sort timings well. What T(x) did you choose and how?
Try to measure how many integers (N) can your system roughly sort in 1 minute. If your system cannot handle it, feel free to choose a lower bound time. Predict what would be the sorting time be if you increase N 100 or 1000 times.
You can do this by extrapolating from the measurements on smaller input sizes and considering the theoretical complexity of the algorithms (or the approximation function, if you don’t know them). Turn the resulting numbers into understandable units (1 day instead of 86400 seconds, etc).
EX5
f(n) = 3n^4 + n and g(n) = n^4 + n^2. Prove that f(n) = Theta(g(n)) by finding suitable c1, c2 and n0. Plot the function f, and also c1 * g(n) and c2 * g(n) (practically you should see a plot where your f(n) is sandwiched by c1*g(n) and c2*g(n) after n0. Check lecture materials for this!)
EX6 (Bonus 1p)
Make an inefficient randomized sorting algorithm that permutes the input and tests if it is sorted. Keep permuting until one of the random permutations is sorted. For how large N it is already infeasible to test all permutations on your laptop? How to create one random permutation of N values (i.e. how to scramble fully the ordered input)?
EX7 (Bonus 1p)
Let's see the asymptotic analysis in actual practice and how we can use it to derive some conclusions for real world problems.
Imagine the following:
Our company needs to make a decision on which software library to use to process up to n = 10 ** 5 data records that exist in a database. The decision lies between two libraries:
- library A has a known processing time of TA = 0.3 * n * log(n) milliseconds
- library B has a known processing time of TB = log(n) + 5 * n milliseconds
(Log is in base 2)
1.1. Which library is faster, according to the "Big-O" notation?
1.2. Find out in which conditions the two libraries outperform each other. Use plots to show your findings.
1.3. Finally, given the initial circumstances, choose the best library for the company to use.
Exercise 6 and 7 is a bonus task. In order to get a full point from a bonus task, we expect you to perform a more throughout analysis. Write up how you plan to solve the task, create nice plots and write an interpretation of the results.