HW8. Graphs (25 October, 23.59)
EX1. This exercise is to get you started with working on your essays. Essay guidelines can be found here (https://courses.cs.ut.ee/2021/Algorithmics/fall/Main/Essays), note that the first deadline is the 8th of November! Developing reading and writing skills are an essential part of the course. The idea is to read at least one paper and write a 2-page essay/summary about that paper aiming at describing that content to your fellow coursemates. We have selected 20 papers for you to choose from. Your first task is to choose a paper that you will write an essay about. First, skim all them through (do it quickly). Write in the report which article you are choosing. Attempt to summarise in a few sentences why you think it will be the most interesting article for you to choose. You don't have to necessarily make your essay on the choice of this homework, but it might give you a good start.
https://docs.google.com/document/d/1tWLqEh2Dc4oD70jauAgh4Gv0MX3YoeiJwzV-GeZZ-Ic/edit?usp=sharing
EX2. Read in the following UNdirected Graph dataset from: dataset link
The dataset starts with a description of vertices (below text VERTICES), which describe all graph vertices with their corresponding coordinates X and Y.
E.g. 0 652 396, is vertex 0 at location X=652 Y=396.
Lines after the line "EDGES" describe graph edges.
E.g. 0 1, there exists an edge between nodes 0 and 1.
You can visualize this graph on your own or by making use of the following web tool: Webtool searchvis. Python code to generate input for visualisation can be found at Graph task visualisation code. Feel free to change parameters, modify the code or implement your own. Sample generated input for the web tool can be found here: link:
Your task here is to start from vertex with id 0 and perform Breadth-First Search and Depth-First Search:
Start from node 0. Perform BFS and DFS and return the order of processing the nodes (the first time you start to process the node (e.g. when you add neighbours to queue/stack)). When adding neighbours, prioritize lower ID nodes first! E.g. if you have to add nodes 1 and 2 to stack/queue, add 1 first and then 2.
The output of this task should be two sequences (one for BFS, one for DFS) [0,1,...] . Add this to your report!
Now, visualize the output of your traversal by adding the order of processing to the visualisation (the one in darkred). To do that, you can either do it by hand (paint) or use the provided visualisation code and change line 152-153, where there is a variable "traversal_order". Change the sequence within the list to be the order of vertices being processed (previous output of the task, you can also use this to confirm your solution works as intended!). Also, add the visualisations of orderings to your report!
Hint: If you use a queue (FiLo: First in - Last out) for first traversal and a stack (LiFo: Last in - First out) for second, then the code for the two solutions should be almost the same! In order to not process nodes you've already processed, you probably need to create an additional data structure to remind you which nodes have already been processed.
EX3. Calculate word-word frequency and output 10 most frequent pairs. Consider each word to be capitalized, e.g. "if" == "IF" == "iF" == "If". You can scroll the text to the right!
text = If the internet is a digital Wild West it s time to lock your doors and close your windows While the amount of cyber attackers and activity alone is alarming in this episode the featured villain is a hacker group backed by the Iranian government In a blog post published Thursday Google s Threat Analysis Group also known as TAG revealed that it had sent more than 50 000 warnings to users whose accounts had been targeted by government backed hacker groups carrying out phishing and malware campaigns so far this year Receiving a warning does not necessarily mean your Google account has been hacked Google does manage to stop some of the attacks but rather that the company has identified you as a target Google stated that this amounted to a nearly 33 increase when compared to the same time last year and attributed the activity to a large campaign launched by the Russian sponsored group Fancy Bear which U S and UK security agencies found had been on a worldwide password guessing spree since at least mid 2019 according to a report published in July Russia s not alone though More than 50 countries have hacker groups working on any given day Google explained We intentionally send these warnings in batches to all users who may be at risk rather than at the moment we detect the threat itself so that attackers cannot track our defense strategies Google said On any given day TAG is tracking more than 270 targeted or government backed attacker groups from more than 50 countries This means that there is typically more than one threat actor behind the warnings While that statistic alone is mind boggling the company also put a spotlight on APT35 a cyber attacker backed by Iran that has hijacked accounts deployed malware and spied on users using novel techniques in recent years In particular Google highlighted four of the most notable APT35 campaigns it s disrupted in 2021 One of APT35 s regular activities is phishing for credentials of so called high value accounts or those belonging to people in government academia journalism NGOs foreign policy and national security The group uses a technique in which it compromises a legitimate website and then deploys a phishing kit
Hint: You can compute the word-word frequency by the formulae between word wi and word wi+1 as :
word-word-frequency = Count(wi , wi+1)
where the function Count(wi , wi+1) measures the number of times (bi-gram) jointly word wi and word wi+1 appear in the text and
EX4. Do a Priority based search task using word-word context (or “probability”) in the text (from the previous excercise). You are not allowed to revisit the same words during the search operation. You need to compute word-word context by a probability function given by:
word-word context = probability = Count(wi , wj)/ Count(wi)
where Count(wi , wj) measures the number of times (bi-gram) jointly word wi and word wj appear in the text and Count(wi) is number of times word wi appear in the text. (also, j is not equal to i).
Starting from 10 different words, show the most probable output.
Hint: You can think Priority based search task as a Markov Transition.
EX5. Use the following directed graph. Convert it into adjacency matrix representation and then convert the matrix into the graph where the original edges are replaced by "3-hop" edges or in other words - connect only vertices that are reachable in 3 steps of the original graph.
Make a transitive closure of the same graph - using two approaches.
1) Multiplication approach: G*G*G...;
2) with the Warshall algorithm. Verify that you got the same answer.
EX6. Calculate multiplication-based transitive closure from the text data. You need to make a graph (from the text) with each word represents a node and there will be an edge between two nodes if 3 or more consecutive letters are common (overlap) between them. You can use code snippet in Colab to prepare an adjacency matrix from the text data.
https://colab.research.google.com/drive/1Km5PYBSJFPAC7GTIIB61T7X8TdAdcmBA?usp=sharing
Hint: You need to use sparsify the matrix multiplication.
EX7. (Bonus 2p) Define on paper or “python code” what a state would look like id we describe 2x2 Rubik cube and all 6 “half-moves” 90° clock-wise on one face only
You are not allowed to rotate anticlockwise and 180°.