Algoritmika - Kursused - Arvutiteaduse instituut

HW2. Sorting & Binary Searching (15th of Sep, 23.59)

EX1(1p) Queues and Stacks - a randomised task scheduler creates and assigns tasks to different queues and stacks, or fetches them for execution. https://colab.research.google.com/drive/11Wq27gcI70IbfmNYfwzOTbJxsxLwO_xN

Assess the generated operations/tasks. Create the needed queues and stacks (collectively data structures, Q1..Q5, S1..S3). Assess the performance of the scheduler in total and each DS separately. After each 100 timesteps, collect the current size of each DS and their sum. Visualise the dynamics of the DS size changes (from 0 to 100_000 steps, i.e. 1000 values on X axis). Interpret the results - what loads does each DS have.

Calculate the time for how long each unique value(id)stayed in each DS. Output the minimal time, the 25%, 50% (median) and 75% quartiles, and the maximal time that any value was in each DS. I.e make a table where the DS are in rows and the corresponding statistics are in columns.

And lastly, make another table that summarizes the DS as follows:

How many items were correctly inserted and deleted?
How many erroneous commands (e.g. dequeue or pop from empty DS) were there for each DS?
What maximal size does each DS reach, when (timestamp)?
How many elements remain in each DS at the end of the scheduler work? (t0100000)

EX2(1p) Task scheduler - use the wrapper around the commands' output in the previous task. Reschedule the same operations, but now according to your own choice of the same number of queues and stacks. You need to honour the insertion of each ID; and deletion (pop). But you can choose into which DS and from which DS.

Re-use the same analysis tools from Task 1 and produce similar outputs, but now to a fairer world. State what would be more fair, and how you achieved that with the changes you made to the scheduler.

EX3(1p) Quicksort - Implement and measure your own Quicksort implementation. Start from 10000 elements and increase the data size 10-fold every time, until the first time it takes over 30 seconds.

quicksort with one pivot
quicksort with two pivots

Here are data generators to test your code on (for each data size, 6 different distributions): https://colab.research.google.com/drive/1_W46VxbsEpEO5ipQQA8vZkxUhbeTTKV1

In the report, provide the description of the chosen splitting method and the respective function itself.

EX4(1p) Report the timings for even more methods. You can experiment with more than one approach. Stick to comparison-based sorts, but possibly take into account some background about the distributions. E.g. better handling of repeated values; early detection of already sorted stretches (as a side product of the pivot-based splits), or similar ideas. The goal is to beat the speed of implementations from EX3. Also, report how much better you succeeded in achieving (e.g. 1.2x faster).

EX5(1p) Binary Search and Order Statistics - how many random lookups and rank order statistics queries can be performed within 1 minute from 10M elements of sorted and unsorted data?
There are two types of queries: a) Look up the key for its membership and rank k. b) Look up, which value is ranked k’th in data. Generate the keys and ranks randomly.
Run these two types of queries on unordered data (with linear time methods) and on already ordered data (binary search and lookup).

For how many random rank order queries from originally randomly ordered data would it be actually faster to first sort then data and then “look up” the key using binary search?
Since the linear time order statistics query changes underlying data, how does that affect the speed after many queries over the data? Describe what happens to underlying data after repeated application of linear rank order queries. Visualize that gradual change using much smaller random data, e.g. 1000 values only.

EX6(bonus 1p) Based on the sorting exercise tasks data sets - can you beat all the above sorting methods on the same provided integer inputs using some linear-time algorithm? How much?

EX7(bonus 1p) Only if you also beat the built-in sort() method on all above data sets (fix one size, e.g. such that built-in takes 100 seconds). Report your results - method and by how much?

Algoritmika 2025/26 sügis

HW2. Sorting & Binary Searching (15th of Sep, 23.59)