Algoritmika - Kursused - Arvutiteaduse instituut

HW7. Hashing and Bloom Filters (20th of Oct, 23.59)

EX1: Use a large list of word and test various hash functions:

https://github.com/dwyl/english-words/raw/refs/heads/master/words.txt

Try out some (three is a good number) different hash functions with the goal to identify a sufficiently good method. Justify the rationale of your designs.

Compare the chosen functions across different table sizes: 100K, 1M, 10M values.

What is the expected number of collisions, and when would you expect the first collision, approximately
When is the first collision actually? (show the first pair and location nr)
How many collisions are there in total?
How many slots remain unused? Is this value similar to what you expected? What did you expect?
What are the top-10 largest number of collisions (of different keys) into a single cell?

Hint: For example, first, convert each string into an integer and then apply a universal hash function scheme discussed in the lecture.

EX2: Equipped with a good method for creating hash functions from EX1, test the idea of a Bloom filter that uses 3, 5, and 7 hash functions.

Assume for one table 1M slots (bits). Therefore, 3, 5, and 7M bits (or slots, e.g. bytes), respectively. You can choose if you use different functions on the same hash-table, or keep each table separately. Validate, that all your keys are stored in the Bloom filter.

Create random strings that are not in your word list, query the Bloom filter with them, and count how many times it incorrectly says 'present'. That count (or the fraction) is the false positive measure. Report it.

EX3: Discuss the similarities and differences between the above hashing functions and the cryptographic hashing in the context of passwords.

Study the effect of using the salt in hash-encrypting the passwords. Read the article https://www.techtarget.com/searchsecurity/definition/salt

Test the time to hash all the words from the above task list of words (cryptographically). This gives you an estimate of the brute force attack to see if any passwords could be from the above list.

Summarise briefly the effect of salting. Now apply salting and discuss the time expected for the similar type of attack.

EX4: How could you use the cryptographic hashing for creating a hash function for filling the hash table like in EX1?

How much slower is the cryptographic hashing function compared to your own hash function from above?

Test how well it would perform (first collision, nr. of collisions, largest nr of collisions in a slot, nr of unused slots) for hash tables, similar to EX1.

EX5: Read about the concept of the Zobrist hashing https://en.wikipedia.org/wiki/Zobrist_hashing

Describe the concept. Illustrate it with an example based on Tic-Tac-Toe

EX6: (bonus, 1p) Study Zobrist hashing further. Discuss how the checkers (on the 8x8 board) could be implemented in principle with that concept.

Algoritmika 2025/26 sügis

HW7. Hashing and Bloom Filters (20th of Oct, 23.59)