HW11. DE, Edit distance, DTW (1st of Dec, 23.59)
EX1: Differential evolution - experiment, how fast can you find a solution.
Look at the code that produces data according to a polynomial of max third degree and noise. It also adds some outliers.
https://colab.research.google.com/drive/1RCiC0EqWlh3SeCPCTE343ZsivqCEt-xQ#scrollTo=W2OZzm6xS6sS
There is a simple ChatGPT generated code for simulated annealing based optimisation for fitting polynomials to data. Your task is to write and evaluate Differential Evolution method for fitting polynomials to the data. Try to find a “good enough” solution with few fitness function evaluations. Test both the RMSE and the median error.
State the design choices and most essential parameters in the report. State how well your code would work in case of linear, quadratic, and cubic functions. And how many evaluations of fitness is sufficient.
(Extra discussion point) If the underlying data is generated only by a linear or quadratic function, can your code determine how complex the function should be?
EX2 - There is code that produces words with more and more errors. Using dynamic programming, and hints in the code, write the similar word matching code that tries to identify the right word from the altered list.
As a minimum, you need to develop code that allows edit distance at uniform cost (insertion-deletion-change at 1.0); that adds transpositions (1.0 or less); that accounts for upper-lowercase character changes at low cost, and also allows for letter-letter changes at low cost (say, 0.1 penalty). Search with each word across the other files, by adding complexity. You see in files and code what is the source word and modified one. But do not assume that, simply run your code across all words. And see if you find the right (and hopefully, only the right) word in each case. Give examples where that was not possible.
Code: https://colab.research.google.com/drive/1mDcE-RzAqk8ieABgOfUP55MjBxytDlmL#scrollTo=AjvPYxrlzqUb
Word lists: Attach:words.zip
EX3: Stitch all the words from the same files into one continuous text, no word boundaries (all the word lists together). Now search with the word from the first file (the correct one) and match similarly against the full-length text. Identify the beginnings and ends of the matches (there may be several equally good ones in continuous text, with small overlaps at the ends).
Hint: code is similar, just make sure the first row of the dynamic programming table is always 0. When you reach the desired match score, you can backtrack, from where that match started... (for more complex cases you do not need to implement all of this; just identify some cases manually, for example).
EX4: DTW. There are images and code for basic dynamic time warping. Study the code, and experiment with it.
Currently it calls for a random file and random line to match the DTW similarity. You should experiment more systematically, what that kind of similarity search could provide. E.g. use one line through the pyramids or waves, and identify “high-similarity” areas, where the Euclidean distance would clearly fail. Play with code. Explain what you observe.
Code: https://colab.research.google.com/drive/1ESGpHtocQilLCKra5JfFhBoJSNT_36lu?usp=sharing
EX5: DTW - Visualise the DTW table - make a heatmap representation. Explain the concept of the band-width. (see the second code in above DTW collab notebook)
Show how the time-series are “warped” when two series are compared. I.e. how the different peaks match each other.
Experiment with the width of the “band” that is in that DTW code. Demonstrate how that band affects the search within the images of EX4; both quality (do you find what you expect) and speed.
EX6: (bonus 1p) - Develop a project proposal that you might want to propose for the course. In the practice session you can pitch it and try to find team-mates. Point for a well-formulated project proposal and estimation of time to complete it. Note that you are not required at this stage to actually do that project. Moreover, please note that this does not need to be definitely the one to do; it is just a proposal and you may change it later if necessary. Be bold and propose some "crazy" ideas :)