Algoritmika - Kursused - Arvutiteaduse instituut

HW8. Graphs (25 October, 23.59)

EX1. This exercise is to get you started with working on your essays. Essay guidelines can be found here (https://courses.cs.ut.ee/2021/Algorithmics/fall/Main/Essays), note that the first deadline is the 8th of November! Developing reading and writing skills are an essential part of the course. The idea is to read at least one paper and write a 2-page essay/summary about that paper aiming at describing that content to your fellow coursemates. We have selected 20 papers for you to choose from. Your first task is to choose a paper that you will write an essay about. First, skim all them through (do it quickly). Write in the report which article you are choosing. Attempt to summarise in a few sentences why you think it will be the most interesting article for you to choose. You don't have to necessarily make your essay on the choice of this homework, but it might give you a good start.

https://docs.google.com/document/d/1tWLqEh2Dc4oD70jauAgh4Gv0MX3YoeiJwzV-GeZZ-Ic/edit?usp=sharing

EX2. Read in the following UNdirected Graph dataset from: dataset link

The dataset starts with a description of vertices (below text VERTICES), which describe all graph vertices with their corresponding coordinates X and Y.

E.g. 0 652 396, is vertex 0 at location X=652 Y=396.

Lines after the line "EDGES" describe graph edges.

E.g. 0 1, there exists an edge between nodes 0 and 1.

You can visualize this graph on your own or by making use of the following web tool: Webtool searchvis. Python code to generate input for visualisation can be found at Graph task visualisation code. Feel free to change parameters, modify the code or implement your own. Sample generated input for the web tool can be found here: link:

Your task here is to start from vertex with id 0 and perform Breadth-First Search and Depth-First Search:

Start from node 0. Perform BFS and DFS and return the order of processing the nodes (the first time you start to process the node (e.g. when you add neighbours to queue/stack)). When adding neighbours, prioritize lower ID nodes first! E.g. if you have to add nodes 1 and 2 to stack/queue, add 1 first and then 2.

The output of this task should be two sequences (one for BFS, one for DFS) [0,1,...] . Add this to your report!

Now, visualize the output of your traversal by adding the order of processing to the visualisation (the one in darkred). To do that, you can either do it by hand (paint) or use the provided visualisation code and change line 152-153, where there is a variable "traversal_order". Change the sequence within the list to be the order of vertices being processed (previous output of the task, you can also use this to confirm your solution works as intended!). Also, add the visualisations of orderings to your report!

Hint: If you use a queue (FiLo: First in - Last out) for first traversal and a stack (LiFo: Last in - First out) for second, then the code for the two solutions should be almost the same! In order to not process nodes you've already processed, you probably need to create an additional data structure to remind you which nodes have already been processed.

EX3. Calculate word-word frequency and output 10 most frequent pairs. Consider each word to be capitalized, e.g. "if" == "IF" == "iF" == "If". You can scroll the text to the right!

  text = If the internet is a digital Wild West  it s time to lock your doors and close your windows  While the amount of cyber attackers and activity alone is alarming  in this episode  the featured villain is a hacker group backed by the Iranian government In a blog post published Thursday  Google s Threat Analysis Group  also known as TAG  revealed that it had sent more than 50 000 warnings to users whose accounts had been targeted by government backed hacker groups carrying out phishing and malware campaigns so far this year  Receiving a warning does not necessarily mean your Google account has been hacked Google does manage to stop some of the attacks but rather that the company has identified you as a target Google stated that this amounted to a nearly 33  increase when compared to the same time last year and attributed the activity to a large campaign launched by the Russian sponsored group Fancy Bear  which U S  and UK security agencies found had been on a worldwide password guessing spree since at least mid 2019  according to a report published in July Russia s not alone though  More than 50 countries have hacker groups working  on any given day   Google explained We intentionally send these warnings in batches to all users who may be at risk  rather than at the moment we detect the threat itself  so that attackers cannot track our defense strategies   Google said   On any given day  TAG is tracking more than 270 targeted or government backed attacker groups from more than 50 countries  This means that there is typically more than one threat actor behind the warnings While that statistic alone is mind boggling  the company also put a spotlight on  APT35  a cyber attacker backed by Iran that has hijacked accounts  deployed malware  and spied on users using  novel techniques  in recent years   In particular  Google highlighted four of the  most notable  APT35 campaigns it s disrupted in 2021 One of APT35 s regular activities is phishing for credentials of so called high value accounts  or those belonging to people in government  academia  journalism  NGOs  foreign policy  and national security  The group uses a technique in which it compromises a legitimate website and then deploys a phishing kit

Hint: You can compute the word-word frequency by the formulae between word w_i and word w_i+1 as :

word-word-frequency = Count(w_i , w_i+1)

where the function Count(w_i , w_i+1) measures the number of times (bi-gram) jointly word w_i and word w_i+1 appear in the text and

EX4. Do a Priority based search task using word-word context (or “probability”) in the text (from the previous excercise). You are not allowed to revisit the same words during the search operation. You need to compute word-word context by a probability function given by:

word-word context = probability = Count(w_i , w_j)/ Count(w_i)

where Count(w_i , w_j) measures the number of times (bi-gram) jointly word w_i and word w_j appear in the text and Count(w_i) is number of times word w_i appear in the text. (also, j is not equal to i).

Starting from 10 different words, show the most probable output.

Hint: You can think Priority based search task as a Markov Transition.

EX5. Use the following directed graph. Convert it into adjacency matrix representation and then convert the matrix into the graph where the original edges are replaced by "3-hop" edges or in other words - connect only vertices that are reachable in 3 steps of the original graph.

Make a transitive closure of the same graph - using two approaches.

1) Multiplication approach: G*G*G...;

2) with the Warshall algorithm. Verify that you got the same answer.

EX6. Calculate multiplication-based transitive closure from the text data. You need to make a graph (from the text) with each word represents a node and there will be an edge between two nodes if 3 or more consecutive letters are common (overlap) between them. You can use code snippet in Colab to prepare an adjacency matrix from the text data.

https://colab.research.google.com/drive/1Km5PYBSJFPAC7GTIIB61T7X8TdAdcmBA?usp=sharing

Hint: You need to use sparsify the matrix multiplication.

EX7. (Bonus 2p) Define on paper or “python code” what a state would look like id we describe 2x2 Rubik cube and all 6 “half-moves” 90° clock-wise on one face only

You are not allowed to rotate anticlockwise and 180°.

Algoritmika 2021/22 sügis

HW8. Graphs (25 October, 23.59)