HW12. Text search, Burrows Wheeler, Suffix tree, Feedback (9th of December, 23.59)
Well done! Now you are at the very end of the course! By now you have completed your first draft of Essay; gave feedback to the others; and should submit the minor edited final version of your work to the courses upload page. That shall be used for grading purposes. Remember to keep it 2 pages, not more (that would affect the layout of the book :).
Last exercises are here:
EX1: Pattern matching is a common problem. Here are a few examples of such algorithms. Interpret the code and results; experiment more with the texts and patterns, alphabet sizes to determine when each method should be used. Come up with some artificially extremely hard data and pattern example use cases for each method, see how much worse you can make them run. Feel free to vary the text and pattern generators for that!
https://colab.research.google.com/drive/1MNuv4sgPP6OMPopXTHohdeefLzEYj24-
EX2: Try to identify cases - data and patterns; for which you can provide better or improved algorithms for. You can pick some nasty cases from above and see if you can provide a better method or better modification of one of the algorithms to be more robust against such cases. The goal would be to beat all four implementations on some test data. Even if you can not achieve it, report your results - the code and measurements and provide a reason why you think it did not work against some that might have been more robust against such a case.
EX3: Draw a suffix array, a suffix trie, and a compact suffix tree for string “bananabanaba”. Describe, how to find “aba” from such indexes.
EX4: Here is a code that generates a suffix array, the Burrows-Wheeler Transformation (BWT), and a very simplistic run-length encoding (RLE). Your task is to generate a code for reverse mapping from that BWT RLE first into BWT, and then into the original text. In the code there is also a longer text in the RLE_BWT, where only the final version is presented. Once you have your decoder, provide that long message in original plain text.
https://colab.research.google.com/drive/18M-oqFyTCJlAbSz4XMfBbNP1DiNsyh8o#scrollTo=ktW5h8Wb1bA_
EX5: (The very last of this course.) There are some very practical tools - for compression, as well as for pattern (exact, approximate, simple or regular expression) searching. Pattern and regular expression searches can also be done approximately. Your task is to run some experiments by yourself - either on compression (compression ratio; and compression speed), or on pattern searching - functionality and speed. For pattern searching e.g. the grep, egrep, fgrep family; and for approximate searches the agrep (there is original 1992 Manber and Wu agrep; and some later implementations). Choose, what question you want to study, and report on that topic only. Read the manuals, state what you wanted to test; and report on that testing - how fast, and how useful are these tools; and for how large data you can comfortably use them.
Goal is that you would learn yourself something new for some topic that intrigues you. IN that way you would get yourself a powerful ally for your future professional life.
EX6. (1p) Provide feedback for the course and homework topics so far
- Contents of the course:
- Which topics were most useful?
- Which homework tasks you liked the most
- What topic got too little attention?
- What would you drop from the course?
- Feedback on quality of the teaching:
- Pre-recorded lectures
- Lectures in the auditorium
- Your homework practice group
- Your thoughts on using the gen-AI (LLM-s) in the course? Did LLM-s help you learn more and deliver more?
EX7 (2p) Create a project plan for you and your team. This will be a separate PDF file upload form on the courses upload choices.
Include in the project plan:
- Name of the team
- Title of the project
- Team members
- Abstract
- The description of an idea to accomplish
- Timeline with the precision of dates, that should include ca 5 milestones (e.g. problem statement and division of work ready; code ready; experiment plan ready; experiments finished; illustrations planned for the report) and 2 deliverables (code ready for experimenation; final poster in print)
- First ideas what to include in the final report to be presented as a poster
This joint report should be done and shared by the entire team. And all team members can upload it independently as the PDF file, as well (a separate form in uploads)
Some videos
Boyer-Moore-Horspool -
Burrows-Wheeler Transform