HW1. Introduction (07.09, 23:59)
Reporting is important, let's practice to make your homework attractive:
If you are using Python, one choice is to use Jupyter Notebookwhich makes it easy to produce either HTML or PDF file as a report (and submit the notebook .ipynb itself as the code). If you are not familiar with Jupyter Notebooks follow the instructions for setup and usage from here.
If you use some other programming language, you can use this Overleaf template (in LaTex) to produce PDF report for this homework and submit your code as usual. You can use a desktop LaTex editor like TeXmaker or come up with your own template similar to the one provided. At the very least, you can use Word to write up your homeworks, but make sure to submit your document as a pdf.
In the course you have to make plots/figures in order to communicate your results. If you have not made scientific plots before, please check this website for basic on how to make scientific graphs: Communicating results with scientific graphs. At the very least, make sure your figures have: title, names for axes, units. Use x-axis as cause and y-axis as effect (one causes the other, e.g. size of array on X and time taken on Y).
EX1. Familiarise yourself with the course and do all the necessary administrative things. Read all the materials in the courses page, definitely:
Make sure you are registered to Piazza (if you are not then you lose points for this task, the invitation emails will be sent shortly after the first lectures). In order to confirm that you are registered to Piazza, add a comment to the post created by instructors with a link to some useful algorithmics related material, briefly explain the contents of this link in the comment. Please, copy paste a screenshot of your comment to the homework report.
EX2.
The first exercise is to implement any sorting algorithm on your own. There is a wide variety of sorting algorithms available. Choose either one presented in the lectures or implement one from the books. Try to understand how the algorithm works.
One of the more important properties we want our algorithms to have is correctness: for any correct input data it returns correct output.
1) First we check that your algorithm indeed seems to give correct output. For that, create a reasonable input array in an unsorted order (you can generate an array of (pseudo-)random integers or make one up yourself). Sort the array using your implementation. Check that the output is correct: no elements missing, elements come in ascending/descending order. Make sure to also try with repeated elements (e.g. you have multiple 10's in your arrays).
Knowing, that our implementation of the algorithm is correct, we move on to investigate performance.
2) Write a function to generate N (pseudo-)random integers x_1, x_2, ..., x_n with values x_i in [0, MAX_INT] (What is the MAX_INT value in your language?)
3) Investigate the variability in runtime of your implementation. For that fix some N (try not to use too small!), generate random integers and measure time taken to sort it. Repeat this process 100+ times and record time taken for each run. Plot and discuss your results.
4) Next, we want to see how the runtime of your implementation changes as N increases. For that choose a list of values for N (e.g. powers of 2), generate random numbers for each N, sort them and record time taken. Note: You might want to repeat sorting for each N several times and choose the average runtime (why?). In your answer, make sure to describe the setup of your experiment: which algorithm, how did you choose N (and why?), how many times did you repeat the experiment for each N? Make sure to plot the results (how runtime increases as N increases). Describe your results.
EX3. Now measure a sorting algorithm, which is built-in in your programming language. Briefly describe the algorithm and state its complexity. Plot the execution times as in EX2, if possible, on the same plot to make comparison easier. Try to also overlay some function f(x) (relation between input array size and output time, e.g. f(1000) = 2*1000 (seconds)) that would approximate the built-in sort timings well. What f(x) did you choose and how?
EX4. Measure how many integers (N) can you roughly sort in 1 minute with both your own implemented sort from EX2 and built-in language-specific sort from EX3. Predict what would be the sorting times for both algorithms if you increase N 100 times. Do it by extrapolating from the measurements on smaller input sizes and considering the theoretical complexity of the algorithms (or the approximation function, if you don’t know them). Turn the resulting numbers to understandable units (1 day instead of 86400 seconds, etc).
EX5. f(n) = 3n^3 + n and g(n) = n^3 + n^2. Prove that f(n) = Theta(g(n)) by finding suitable c1, c2 and n0. Plot the function f, and also c1 * g and c2 * g.
EX6. (Bonus 1p) Make an inefficient randomized sorting algorithm that permutes the input and tests if it is sorted. Keep permuting until one of the random permutations is sorted. For how large N it is already infeasible to test all permutations on your laptop? How to create one random permutation of N values (i.e. how to scramble fully the ordered input)? Ps. It's worth to install Gnuplot. Alternatively, there are also web services around it: http://gnuplot.respawned.com/