HW8. Graphs (24th of October, 23.59)
EX1. There are many data types that can be represented as graphs. Study online collection at https://snap.stanford.edu/data/. Select from there 2 graphs that are different in type and size. Consider them as undirected (if edge A->B exists, but B->A doesn't exist, then also add B->A) and characterize them by calculating:
- Nr of nodes
- Nr of edges
- Distribution of node degrees (visualize it using a histogram)
What is the graph about? What can you tell about the graphs from these statistics? Are these graphs dense or sparse?
EX2.
Read in the following UNdirected Graph dataset from: dataset link
The dataset starts with a description of vertices (below text VERTICES), which describe all graph vertices with their corresponding coordinates X and Y.
E.g. 0 652 396, is vertex 0 at location X=652 Y=396.
Lines after the line "EDGES" describe graph edges.
E.g. 0 1, there exists an edge between nodes 0 and 1.
You can visualize this graph on your own or by making use of the following web tool: Webtool searchvis. Python code to generate input for visualisation can be found at Graph task visualisation code. Feel free to change parameters, modify the code or implement your own. Sample generated input for the web tool can be found here: link:
Your task here is to start from vertex with id 0 and perform Breadth-First Search and Depth-First Search:
Start from node 0. Perform BFS and DFS and return the order of processing the nodes (the first time you start to process the node (e.g. when you add neighbours to queue/stack)). When adding neighbours, prioritize lower ID nodes first! E.g. if you have to add nodes 1 and 2 to stack/queue, add 1 first and then 2.
The output of this task should be two sequences (one for BFS, one for DFS) [0,1,...] . Add this to your report!
Now, visualize the output of your traversal by adding the order of processing to the visualisation (the one in darkred). To do that, you can either do it by hand (paint) or use the provided visualisation code and change line 152-153, where there is a variable "traversal_order". Change the sequence within the list to be the order of vertices being processed (previous output of the task, you can also use this to confirm your solution works as intended!). Also, add the visualisations of orderings to your report!
Hint: If you use a queue (FiLo: First in - Last out) for first traversal and a stack (LiFo: Last in - First out) for second, then the code for the two solutions should be almost the same! In order to not process nodes you've already processed, you probably need to create an additional data structure to remind you which nodes have already been processed.
EX3. Use the following directed graph. Convert it into adjacency matrix representation and then convert the matrix into the graph where the original edges are replaced by "3-hop" edges or in other words - connect only vertices that are reachable in 3 steps of the original graph.
EX4. Make a transitive closure of the same graph - using two approaches. 1) Multiplication approach: G*G*G...; 3) with Warshall algorithm. Verify that you got the same answer.
EX5. Implement a "random walk" procedure following links randomly (equal probabilities). Using the same graph again, estimate the probability of being in any given node by varying how you deal with "dead ends" (e.g. node B) and "nodes with no links into it" (e.g. nodes D or A). First, try "re-appearing anywhere" from dead ends only. Secondly, when walking, with 20% probability jump to any random vertex in the network, and 80% of times select randomly and go to one of the neighbouring vertices. For both scenarios, provide the respective probabilities and identify the most important nodes.
EX6 (Bonus 2p). Use the random walk data from the previous exercise. Implement the "same" random walk procedure but now clearly using matrix multiplications much like in finding transitive closures. Assign uneven probabilities to links - e.g. 80%-20% for two-link nodes (larger probability to alphabetically the first neighbour) and 60%-30%-10% for three link ones. Feel free to modify the input graph a bit. Introduce enough links so that you do not have "dead ends" or nodes where nothing leads into. E.g. I recommend introducing links G->H, B->K, K->H. But feel free to experiment. Of course, you should also figure out what happens to graph multiplications when you have not yet changed the graph. Your task is to report transition probabilities after N steps. How do you know when to stop multiplying?
EX7 (Bonus 4p). Tilt is a puzzle game played on a Tilt board: an n × n square grid, where grid square contains either a fixed obstacle, a movable slider, or neither (the grid square is empty). A Tilt move consists of tilting a board in any of the four cardinal directions (up, left, down or right), causing each slider to slide maximally in that direction until it hits either: the edge of the board, an obstacle, or another slider that cannot slide any further. A Tilt move results in a new Tilt board configuration. Given a Tilt board B with one grid square t labeled as the target, a sequence of k Tilt moves solves Tilt puzzle (B,t) if applying the moves in sequence to B results in a Tilt board configuration B' that contains a slider in square t. The figure below shows a small Tilt puzzle and a solution using the fewest possible moves k = 8. Obstacles are shown in black, movable sliders are circles, and the target square is shaded gray.
We represent an n × n board configuration B using a length-n tuple of length-n tuples, each representing a row of the configuration, where the grid square in row y (from the top) and column x (from the left) is B[y][x], equal to either character ’#’ (an obstacle), ’o’ (a slider), or ’.’ (neither). We represent the target square by a tuple t = (xt, yt) where the puzzle is solved when B[yt][xt] = ’o’. Your code template has a function move(B, d) which computes a move from board B in direction d in O(n2) time.
- Represent the configuration space (state-space) diagram as a tree.
- Given a Tilt puzzle with starting n × n board B containing b fixed obstacles and s movable sliders, argue that the number of board configurations reachable from B is at most: (n2 - b)!/s!(n2- b - s)!