HW1. Introduction (Monday, 09.09, 23:59)
Reporting is important, let's practice to make your homework attractive:
If you are using Python, one choice is to use Jupyter Notebook which makes it easy to produce either HTML or PDF file as a report (and submit the notebook .ipynb itself as the code). If you are not familiar with Jupyter Notebooks follow the instructions for setup and usage from here.
If you use some other programming language, you can use this Overleaf template (in LaTex) to produce a PDF report for this homework and submit your code as usual. You can use a desktop LaTex editor like TeXmaker or come up with your own template similar to the one provided. At the very least, you can use Word to write up your homework, but make sure to submit your document as a pdf.
In the course, you have to make plots/figures in order to communicate your results. If you have not made scientific plots before, please check this website for basic on how to make scientific graphs: Communicating results with scientific graphs. At the very least, make sure your figures have: title, names for axes, units. Use x-axis as cause and y-axis as effect (one causes the other, e.g. size of the array on X and time taken on Y).
For an example of sample submission using Jupyter notebook, check the demo solution.
Generative AI - you are encouraged to use gen-AI models (ChatGPT, Gemini, Co-pilot, etc)responsibly. They can speed up your coding significantly. But they can also create bloated or slightly wrong code, which you spend ages debugging. Think.
Your reports should be as succinct as possible - short but sufficient, always to the point. Avoid lengthy discussions on irrelevant side-topics. This saves your time and teachers time.
EX1 (1p)
Familiarise yourself with the course and do all the necessary administrative things. Read all the materials on the courses page, definitely:
To get a point for this task, make sure to read the above links and write in your homework document what do you need to pay attention to when submitting your homework.
Make sure to also have joined the course's Slack channel, as we will use it to communicate course information, homework feedback, and more!
EX2 and EX3 (1p + 1p)
In the study of algorithms, we are often interested in the performance of the implementations of our algorithms and data structures.
The first real exercise in this course helps you to set up and get acquainted with the experimentation framework as we will be using the same formula throughout many homework exercises. We will set up the generation of random data, benchmarking, and plotting, which you can potentially use in future homework tasks.
1) Generate random integers/numbers in your choice of programming language and write a reusable code to generate a list/array of various input data sets varying in size and type of data. Generate cases for N integers (e.g. N in 1000, 10 000, 100 000, 1000 000 in a given range: positive integers from 0..(2**8-1), 0..(2**16-1), 0..(2**32-1), 0..(2**64-1) or in other words, 1 byte, 2 bytes, 4 bytes, 8 bytes, and signed integers in ranges -100..+100, -10000..+10000, -1 000 000 000..+1 000 000 000 ).
2) Figure out how to perform a timing experiment - how long it takes to run your code. One simple way to do it would be to find time before code execution and then after execution. The time taken is then the difference between those two times. For Python, check out the following article: Python Timer, but you can also implement this your way. Collect all measurements of all test cases in some larger table; later, they will be processed for summarised tabular and graphical outputs.
3) Measure the time it takes to run as the input size increases or the data type changes. If you want to use another programming language, feel free to rewrite it. For each generated list, calculate the average difference between consecutive elements in the list.
For each size, run the experiment 10 times (every time generating a new input list! Why?) and calculate the average time it took for each input size. Report the obtained average differences and the time required.
4) Analyse runtime variability by running the code many times for each data (for every run, generate a new list/dataset). Does it take a different time to execute on the same input? Why? Write down why the algorithm takes a different time to execute on the same input size.
5) Create an output summary table and illustrative plots showing the results of experiments from steps 3 and 4. How do you summarise such complex tables? Which plot should you use? Make sure to add labels to your axes and a title to your plots. Interpret what you see in the plots.
So, in the end:
- Create a way to generate various types of inputs to an algorithm;
- Figure out how to time parts of code;
- Measure the time for a given algorithm across several orders of magnitude in size and/or time;
- Analyse runtime variability;
- Create summary tables and plots.
EX4 (1p)
Measure a sorting algorithm that is part of your programming language's standard library. Briefly describe the algorithm and state its computational complexity (you might need to dig a bit of documentation for it). Follow similar steps to analyse the sorting algorithm as you did in EX 2/3.
Try to overlay some function T(x) (relation between input array size and output time, e.g. T(x) = 2*x (seconds), where x is input list's length) that would approximate the built-in sort timings well. What T(x) did you choose and how?
Try to measure how many integers (N) can your system roughly sort in 1 minute. If your system cannot handle it, feel free to choose a lower bound time. Predict what would be the sorting time be if you increase N 100 or 1000 times.
You can do this by extrapolating from the measurements on smaller input sizes and considering the theoretical complexity of the algorithms (or the approximation function, if you don’t know them). Turn the resulting numbers into understandable units (1 day instead of 86400 seconds, etc).
EX5 (1p) Asymptotic behavior of functions
For each of the following sets of five functions, order them so that if fa appears before fb in your sequence then fa = O(fb). If fa = O(fb) and fb = O(fa) (meaning fa and fb could appear in either order), indicate this by enclosing fa and fb in a set with curly braces. For example, if the functions are:
- f1 = n
- f2 = {$ \sqrt{n} $}
- f3 = n + {$ \sqrt{n} $}
the correct answers are (f2, {f1, f3 }) or (f2, {f3, f1 })
Heading 1 | Heading 2 |
---|---|
f1= (log n)2023 | f1= 2n |
f2= n2log (n2023) | f2= n3 |
f3= n3 | f3= nCn/2 |
f4= 2.023n | f4= !n |
f5=nlog n | f5= nC3 |
Exercise 6 is a bonus task. In order to get a full point from a bonus task, we expect you to perform a more throughout analysis. Write up how you plan to solve the task, create nice plots and write an interpretation of the results.
EX6 (Bonus 1p)
Let's see the asymptotic analysis in actual practice and how we can use it to derive some conclusions for real world problems.
Imagine the following:
Our company needs to make a decision on which software library to use to process up to n = 10 ** 5 data records that exist in a database. The decision lies between two libraries:
- library A has a known processing time of TA = 0.3 * n * log(n) milliseconds
- library B has a known processing time of TB = log(n) + 5 * n milliseconds
(Log is in base 2)
1.1. Which library is faster, according to the "Big-O" notation?
1.2. Find out in which conditions the two libraries outperform each other. Use plots to show your findings.
1.3. Finally, given the initial circumstances, choose the best library for the company to use.