HW5. Spatial data, k-ary heap (6th of Oct, 23.59)
EX1: From quad-tree to 9-tree. Describe how the prefix code for each pixel is calculated.
Apply to images (pick some of your own photos reflecting your interests). Scale the image and recalculate prefix code on the rescaled or stretched images. Draw an arrow, similar to the example image, from two input prefix codes. Note that the example was done by an eye, so the dots may not be perfectly located. Use libraries like Pillow or similar for image processing. What might be the benefits of using this 9-tree prefix coding for working with images?
EX2: Calculate the neighbourhoods of a pixel using their prefix code. Eight immediate neighbours and the “circle” at distance X ( X = 3, 5, 8 ).
Example:
EX3: Extract from the photo(s) the vectors of RGB values of pixels and use these for studying the concept of a RP-tree.
One pixel represents a 3D and 6 pixels in row is a 24-dimensional vector. Extract such vectors (non-overlapping) and plot their distribution / similarity using a PCA plot. You can limit the data by randomly picking vectors, e.g. 10K vectors (as 1000x1000 image alone would be 1M pixels).
For an example: https://colab.research.google.com/drive/1sqgxu6KS8itibhe-V-BMW2zWMqv3nqFF#scrollTo=VpLZ6GCIKosD
Create top 2 layers of a Random Projection tree for the 3D and the 24D data. That should divide these vectors into 4 areas. Color the points on PCA with 4 different colors accordingly.
EX4: Describe conceptually, by an illustration, how to use RP-tree for searching similar vectors, as defined by Euclidean distance d. Provide also a piece of code where, given a search point and distance d, you would choose whether for a given node in RP-tree, to search both from the left and the right subtree, or one direction only. Use the 3D vectors as an example.
EX5: Implement an in-place heap-sort algorithm using k-ary heaps for random integers. Use linear-time heapify (appropriately applying bubble-down to roots of all small trees of 2 layers and above, i.e. all internal nodes from right to left). E.g. from max-heap the largest value is extracted from the heap and placed into a newly freed slot after the heap got shrinked. The last value remaining and placed in front, is the smallest. Identify which k of the k-ary heap is the best for sorting. Does sorting bring out the best features of the k-ary heaps? Was heap-sort in your hands any “better” than the past studied methods?
EX6: (Bonus, 2p) The RP-trees and similar other multi-dimensional trees use Euclidean distance as the basis of similarity search. However, many applications rely on correlation and cosine similarity measures. How to support such applications for K nearest neighbour searches? Describe some concepts, provide example illustrations.