HW11. Text algorithms, NFA and projects (30th of Nov, 23.59)
EX1: Write a very simple "grep" all by yourself - for comparing naive search and one other method of your choice from KMP, BMH, BMHHS algorithms. Use for example the same dictionary of English words as was used for hash function collision counting (https://raw.githubusercontent.com/dwyl/english-words/master/words.txt).
EX2: Now, in comparison, use the grep (https://en.wikipedia.org/wiki/Grep) and fgrep (or grep -F) for multiple patterns. State the speed - how many MB/s and how many times faster are these than your code? What is the speed difference matching one (long, short) pattern vs many patterns?
You can calculate MB/s by doing the following. Check what is the filesize in MB. Calculate the time taken to process every line in the file (try to match each line to your pattern). MB/s is then Filesize divided by seconds taken to process it.
EX3: Given the following
(ab|(aa|b)(ba)*(bb|a))*
Construct an NFA (Nondeterministic-Finite Automaton) using Thompson construction algorithm.
Minimize the constructed NFA.
Provide example patterns which are accepted by this automaton and some that are not.
EX4 & EX5: The course ends with the project to be completed usually by a small team of 2-3 people. Your task is to come up with one project proposal all by yourself (this is an individual task, you will have time to share) - taking ideas from project ideas file, from previous homeworks, or your imagination + google. E.g. you may attempt to extend some of the past exercises. Note - this is a proposal that you may use to attract other students to join; and this is at the same time a proposal that you do not need to start executing necessarily. So. it's more a planning exercise, not execution exercise.
Your project proposal should have:
- Title
- 2-5 sentences of a short description
- Briefly described the motivation and the main challenge of this project
- Division of tasks, the estimated number of work hours per task, and deadline (aim at our poster session!)
- Allocation of the 2-3 people to tasks and hours.
- Recommended: create a GANTT chart to cover the previous two points.
- Description of the envisioned end results that would go to the project report/poster.
We will discuss these in the practice session.