Today we are going to do some network analysis. The goals of our lab session:

Libraries

Today we will need following libraries:

install.packages("igraph")
library("igraph")
library("ggplot2")

Igraph is a package for creating and manipulating graphs and analyzing networks. There are a number of different software packages available for this purpose, but iGraph has become perhaps the most flexible and powerful library for performing network analysis.

Basic work with networks

Creating networks

Let’s create an undirected graph with 3 edges.

g1 <- graph(edges=c(1,2, 2,3, 3, 1), n=3, directed=F)
plot(g1)

On the plot above we have 3 nodes (1,2,3) and undirected edges that connect nodes next way: 1->2, 2->3, 3->1.

Now we can study our graph simply by calling the variable:

g1
## IGRAPH 8236333 U--- 3 3 -- 
## + edges from 8236333:
## [1] 1--2 2--3 1--3

Above we can find some important information about the graph:

  • First number - the number of nodes
  • Second number - the number of edges in the graph
  • List of edges

Now let us create another graph:

g2 <- graph(edges=c(1,2, 2,3, 3,1, 4,5, 8,5, 4,7, 2,6), n=8)
g2
## IGRAPH 824833e D--- 8 7 -- 
## + edges from 824833e:
## [1] 1->2 2->3 3->1 4->5 8->5 4->7 2->6

How much nodes and edges does this graph have?

plot(g2)

Above is an example of disconnected graph.

Having numbers as names of the nodes can be meaningless, so let us name them:

g3 <- graph(c("John", "Jim", "Jim", "Jill", "Jill", "John"))
plot(g3)

We can make isolated nodes by passing the list to “isolates” attribute:

g4 <- graph(c("John", "Jim", "Jim", "Jack", "Jim", "Jack", "John", "John"),
isolates=c("Jesse", "Janis", "Jennifer", "Justin"))

plot(g4, edge.arrow.size=.5, vertex.color="darkgreen", vertex.size=15,
vertex.frame.color="black", vertex.label.color="black",
vertex.label.cex=0.8, vertex.label.dist=2.5, edge.curved=0.5)

Small graphs can also be generated with a description of this kind: - for undirected tie, +- or -+ for directed ties pointing left & right, ++ for a symmetric tie, and “:” for sets of vertices.

Edge, vertex, and network attributes

We can call specific functions to learn about graph attributes:

E(g4) # returns edges
## + 4/4 edges from 8355054 (vertex names):
## [1] John->Jim  Jim ->Jack Jim ->Jack John->John
V(g4) # returns nodes
## + 7/7 vertices, named, from 8355054:
## [1] John     Jim      Jack     Jesse    Janis    Jennifer Justin

Or we can study matrix:

g4[]
## 7 x 7 sparse Matrix of class "dgCMatrix"
##          John Jim Jack Jesse Janis Jennifer Justin
## John        1   1    .     .     .        .      .
## Jim         .   .    2     .     .        .      .
## Jack        .   .    .     .     .        .      .
## Jesse       .   .    .     .     .        .      .
## Janis       .   .    .     .     .        .      .
## Jennifer    .   .    .     .     .        .      .
## Justin      .   .    .     .     .        .      .

Or even separate rows:

g4[1,]
##     John      Jim     Jack    Jesse    Janis Jennifer   Justin 
##        1        1        0        0        0        0        0

And columns:

g4[,1]
##     John      Jim     Jack    Jesse    Janis Jennifer   Justin 
##        1        0        0        0        0        0        0

We can add new attributes to existing graph:

V(g4)$name # was already there
## [1] "John"     "Jim"      "Jack"     "Jesse"    "Janis"    "Jennifer"
## [7] "Justin"
V(g4)$gender <- c("male", "male", "male", "male", "female", "female", "male")
E(g4)$type <- "email" # assigns "email" to all edges
E(g4)$weight <- 10

Now, let’s examine attributes:

edge_attr(g4)
## $type
## [1] "email" "email" "email" "email"
## 
## $weight
## [1] 10 10 10 10
vertex_attr(g4)
## $name
## [1] "John"     "Jim"      "Jack"     "Jesse"    "Janis"    "Jennifer"
## [7] "Justin"  
## 
## $gender
## [1] "male"   "male"   "male"   "male"   "female" "female" "male"

If you want to assign attribute value to the whole graph, you can use:

g4 <- set_graph_attr(g4, "name", "Email Network")
g4 <- set_graph_attr(g4, "something", "A thing")
graph_attr_names(g4)
## [1] "name"      "something"
graph_attr(g4, "something")
## [1] "A thing"
graph_attr(g4)
## $name
## [1] "Email Network"
## 
## $something
## [1] "A thing"

Now, if we want to delete some attribute:

g4 <- delete_graph_attr(g4, "something")
graph_attr(g4)
## $name
## [1] "Email Network"
plot(g4, 
     edge.arrow.size=.5, 
     vertex.label.color="black", 
     vertex.label.dist=1.5,
     vertex.color=c("pink", "skyblue")[1+(V(g4)$gender=="male")]) 

We have the situation where we have a loop from “John” to himself and two directed edges between “Jim” and “Jack”. We can remove unnecessary edges, using edge.attr.comb() to indicate how edge attributes should be combined.

We can choose following options:

  • sum
  • mean
  • prod
  • min
  • max
  • first/last
  • ignore
g4s <- simplify(g4, 
                remove.multiple = T, 
                remove.loops = F, 
                edge.attr.comb=c(weight="sum", type="ignore"))
plot(g4s, vertex.label.dist=1.5)

E(g4s)$weight
## [1] 10 10 20
g4s
## IGRAPH 853ffb5 DNW- 7 3 -- Email Network
## + attr: name (g/c), name (v/c), gender (v/c), weight (e/n)
## + edges from 853ffb5 (vertex names):
## [1] John->John John->Jim  Jim ->Jack

Now we can see some additional information in summary:

  • D or U, for a directed or undirected graph
  • N for a named graph (where nodes have a name attribute)
  • W for a weighted graph (where edges have a weight attribute)
  • B for a bipartite (two-mode) graph (where nodes have a type attribute)

The two numbers that follow (7 3) refer to the number of nodes and edges in the graph. The description also lists node & edge attributes, for example:

  • (g/c) - graph-level character attribute
  • (v/c) - vertex-level character attribute
  • (e/n) - edge-level numeric attribute

Network analysis

As we learned basic functions to work with graphs, now we will apply them to use in practice. To be more precise, we will study a dataset to understand how different media organizations are related with each other. That would let us know, which marketing channel is better to use to do advertising.

We will start by loading the data:

nodes <- read.csv(file.choose()) # Dataset1-NODES.csv
links <- read.csv(file.choose()) # Dataset1-EDGES.csv

Let’s examine the data:

str(nodes)
## 'data.frame':    17 obs. of  5 variables:
##  $ id           : Factor w/ 17 levels "s01","s02","s03",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ media        : Factor w/ 17 levels "ABC","AOL.com",..: 10 15 14 13 7 9 4 8 5 1 ...
##  $ media.type   : int  1 1 1 1 1 1 2 2 2 2 ...
##  $ type.label   : Factor w/ 3 levels "Newspaper","Online",..: 1 1 1 1 1 1 3 3 3 3 ...
##  $ audience.size: int  20 25 30 32 20 50 56 34 60 23 ...
str(links)
## 'data.frame':    51 obs. of  4 variables:
##  $ from  : Factor w/ 16 levels "s01","s02","s03",..: 1 1 1 1 4 5 6 8 8 3 ...
##  $ to    : Factor w/ 17 levels "s01","s02","s03",..: 2 2 3 4 11 15 17 9 9 4 ...
##  $ weight: int  10 12 22 21 22 21 21 11 12 22 ...
##  $ type  : Factor w/ 2 levels "hyperlink","mention": 1 1 1 1 2 2 2 2 2 1 ...
cat("Amount of rows in nodes data: ", nrow(nodes), "\n")
## Amount of rows in nodes data:  17
cat("Amount of unique nodes: ", length(unique(nodes$id)), "\n")
## Amount of unique nodes:  17
cat("Amount of rows in links data: ", nrow(links), "\n")
## Amount of rows in links data:  51
cat("Amount of unique links: ", nrow(unique(links[,c("from", "to")])), "\n")
## Amount of unique links:  48

As you can see, total amount of links is bigger then unique links with combination from;to. This shows us that there are nodes with two or more edges.

We will collapse all links of the same type between the same two nodes by summing their weights:

links <- aggregate(links[,3], links[,-3], sum) #summing weights by other cols
colnames(links)[4] <- "weight"

Creating igraph objects

Now, as we have data loaded, we can create igraph objects and work with them. For this, we will use graph_from_data_frame() that accepts data.frames that describe edges and nodes.

net <- graph_from_data_frame(d=links, vertices=nodes, directed=T)
net
## IGRAPH 892f778 DNW- 17 48 -- 
## + attr: name (v/c), media (v/c), media.type (v/n), type.label
## | (v/c), audience.size (v/n), type (e/c), weight (e/n)
## + edges from 892f778 (vertex names):
##  [1] s02->s01 s03->s01 s15->s01 s01->s02 s05->s02 s01->s03 s02->s03
##  [8] s04->s03 s08->s03 s10->s03 s01->s04 s03->s04 s15->s04 s17->s04
## [15] s03->s05 s15->s06 s16->s06 s03->s08 s02->s09 s05->s09 s02->s10
## [22] s07->s10 s03->s11 s03->s12 s04->s12 s13->s12 s12->s13 s06->s16
## [29] s05->s01 s07->s03 s04->s06 s12->s06 s08->s07 s07->s08 s08->s09
## [36] s03->s10 s09->s10 s04->s11 s14->s11 s14->s13 s07->s14 s12->s14
## [43] s01->s15 s05->s15 s04->s17 s06->s17 s13->s17 s16->s17
E(net) # The edges of the "net" object
## + 48/48 edges from 892f778 (vertex names):
##  [1] s02->s01 s03->s01 s15->s01 s01->s02 s05->s02 s01->s03 s02->s03
##  [8] s04->s03 s08->s03 s10->s03 s01->s04 s03->s04 s15->s04 s17->s04
## [15] s03->s05 s15->s06 s16->s06 s03->s08 s02->s09 s05->s09 s02->s10
## [22] s07->s10 s03->s11 s03->s12 s04->s12 s13->s12 s12->s13 s06->s16
## [29] s05->s01 s07->s03 s04->s06 s12->s06 s08->s07 s07->s08 s08->s09
## [36] s03->s10 s09->s10 s04->s11 s14->s11 s14->s13 s07->s14 s12->s14
## [43] s01->s15 s05->s15 s04->s17 s06->s17 s13->s17 s16->s17
V(net) # The vertices of the "net" object
## + 17/17 vertices, named, from 892f778:
##  [1] s01 s02 s03 s04 s05 s06 s07 s08 s09 s10 s11 s12 s13 s14 s15 s16 s17
E(net)$type # Edge attribute "type"
##  [1] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
##  [6] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
## [11] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
## [16] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
## [21] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
## [26] "hyperlink" "hyperlink" "hyperlink" "mention"   "mention"  
## [31] "mention"   "mention"   "mention"   "mention"   "mention"  
## [36] "mention"   "mention"   "mention"   "mention"   "mention"  
## [41] "mention"   "mention"   "mention"   "mention"   "mention"  
## [46] "mention"   "mention"   "mention"
V(net)$media # Vertex attribute "media"
##  [1] "NY Times"            "Washington Post"     "Wall Street Journal"
##  [4] "USA Today"           "LA Times"            "New York Post"      
##  [7] "CNN"                 "MSNBC"               "FOX News"           
## [10] "ABC"                 "BBC"                 "Yahoo News"         
## [13] "Google News"         "Reuters.com"         "NYTimes.com"        
## [16] "WashingtonPost.com"  "AOL.com"
plot(net, edge.arrow.size=.4, vertex.label.dist=2.5)

As you can see, there are some loops on the graph. Let’s remove them:

net <- simplify(net, remove.multiple = F, remove.loops = T)
plot(net, edge.arrow.size=.4, vertex.label.dist=2.5)

There are also double edges, but we won’t remove them this time since they can belong to different type(for example “hyperlinks” and “mentions”).

We can obtain information about edges using next functions:

The following function returns the list of edges:

as_edgelist(net, names=T)
##       [,1]  [,2] 
##  [1,] "s02" "s01"
##  [2,] "s03" "s01"
##  [3,] "s15" "s01"
##  [4,] "s01" "s02"
##  [5,] "s05" "s02"
##  [6,] "s01" "s03"
##  [7,] "s02" "s03"
##  [8,] "s04" "s03"
##  [9,] "s08" "s03"
## [10,] "s10" "s03"
## [11,] "s01" "s04"
## [12,] "s03" "s04"
## [13,] "s15" "s04"
## [14,] "s17" "s04"
## [15,] "s03" "s05"
## [16,] "s15" "s06"
## [17,] "s16" "s06"
## [18,] "s03" "s08"
## [19,] "s02" "s09"
## [20,] "s05" "s09"
## [21,] "s02" "s10"
## [22,] "s07" "s10"
## [23,] "s03" "s11"
## [24,] "s03" "s12"
## [25,] "s04" "s12"
## [26,] "s13" "s12"
## [27,] "s12" "s13"
## [28,] "s06" "s16"
## [29,] "s05" "s01"
## [30,] "s07" "s03"
## [31,] "s04" "s06"
## [32,] "s12" "s06"
## [33,] "s08" "s07"
## [34,] "s07" "s08"
## [35,] "s08" "s09"
## [36,] "s03" "s10"
## [37,] "s09" "s10"
## [38,] "s04" "s11"
## [39,] "s14" "s11"
## [40,] "s14" "s13"
## [41,] "s07" "s14"
## [42,] "s12" "s14"
## [43,] "s01" "s15"
## [44,] "s05" "s15"
## [45,] "s04" "s17"
## [46,] "s06" "s17"
## [47,] "s13" "s17"
## [48,] "s16" "s17"

The function below creates matrix that shows us weight for each edge. The first one shows the list of all edges from(the first column) and to(the second) Another function shows us the matrics, that shows edges from(the first column) and to(first row). Numbers here represent the attribute weight, as we passed it into the function.

as_adjacency_matrix(net, attr="weight")
## 17 x 17 sparse Matrix of class "dgCMatrix"
##                                                      
## s01  . 22 22 21 .  .  .  .  .  .  .  .  .  . 20  .  .
## s02 23  . 21  . .  .  .  .  1  5  .  .  .  .  .  .  .
## s03 21  .  . 22 1  .  .  4  .  2  1  1  .  .  .  .  .
## s04  .  . 23  . .  1  .  .  .  . 22  3  .  .  .  .  2
## s05  1 21  .  . .  .  .  .  2  .  .  .  .  . 21  .  .
## s06  .  .  .  . .  .  .  .  .  .  .  .  .  .  . 21 21
## s07  .  .  1  . .  .  . 22  . 21  .  .  .  4  .  .  .
## s08  .  .  2  . .  . 21  . 23  .  .  .  .  .  .  .  .
## s09  .  .  .  . .  .  .  .  . 21  .  .  .  .  .  .  .
## s10  .  .  2  . .  .  .  .  .  .  .  .  .  .  .  .  .
## s11  .  .  .  . .  .  .  .  .  .  .  .  .  .  .  .  .
## s12  .  .  .  . .  2  .  .  .  .  .  . 22 22  .  .  .
## s13  .  .  .  . .  .  .  .  .  .  . 21  .  .  .  .  1
## s14  .  .  .  . .  .  .  .  .  .  1  . 21  .  .  .  .
## s15 22  .  .  1 .  4  .  .  .  .  .  .  .  .  .  .  .
## s16  .  .  .  . . 23  .  .  .  .  .  .  .  .  .  . 21
## s17  .  .  .  4 .  .  .  .  .  .  .  .  .  .  .  .  .

Now we have information on how media companies are related with each other.

Let’s study the data by visualizing. For starters, let’s replace node labels, with node names stored in “media”.

plot(net, 
     edge.arrow.size=.2, 
     vertex.label.dist=1.8, 
     vertex.label.cex=0.8,
     vertex.size=10,
     vertex.color="orange", 
     vertex.frame.color="black",
     vertex.label=V(net)$media, 
     vertex.label.color="black")

Now, let’s add different colors depending on which media type each node has.

unique(V(net)$media.type)
## [1] 1 2 3
colors <- c("gray50", "tomato", "gold")
V(net)$color <- colors[V(net)$media.type]
plot(net, 
     edge.arrow.size=.2,  
     vertex.label.dist=1.8, 
     vertex.label.cex=0.8,
     vertex.size=10,
     vertex.label=V(net)$media, 
     vertex.label.color="black")

Now it’s easier for us to see, to which media type belongs each node:

  • Red - TV
  • Yellow - internet
  • Grey - newspapers

Let’s continue our research and make our plot to show audience size:

V(net)$size <- V(net)$audience.size*.4
plot(net, 
     edge.arrow.size=.2, 
     vertex.label.dist=1.8, 
     vertex.label.cex=.8,
     vertex.label=V(net)$media, 
     vertex.label.color="black")

And the weight of the edge:

E(net)$width <- E(net)$weight/6

plot(net, 
     edge.arrow.size=.2,
     vertex.label.dist=1.8, 
     vertex.label.cex=.8,
     vertex.label=V(net)$media, 
     vertex.label.color="black")

And final touch would be setting layout for graph:

graph_attr(net, "layout") <- layout_with_lgl
plot(net, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label=V(net)$media, 
     vertex.label.color="black")

Now let us add some explanations for the colors:

graph_attr(net, "layout") <- layout_with_lgl
plot(net, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label=V(net)$media, 
     vertex.label.color="black")

legend(x=-1.5, y=-1.1, #coordinates
       c("Newspaper","Television", "Online News"), 
       pch=21, 
       pt.bg=colors, 
       pt.cex=2, #size of the circle
       cex=.8, #size of the tex
       ncol=3) #amount of columns in legends

More network analysis

At this point, we learned how to create a simple network and analyze it. But what if we want to see more information?

Let’s replace node names like ‘s01’, ‘s02’, ‘s03’ etc. with names of media companies, so that we do not have to specify node name each time we plot.

V(net)$id=V(net)$name
V(net)$name=V(net)$media

Now, let’s add different colors depending on which media type each node has.

unique(V(net)$media.type)
## [1] 1 2 3
colors <- c("gray50", "tomato", "gold")
V(net)$color <- colors[V(net)$media.type]
plot(net, 
     edge.arrow.size=.2,  
     vertex.label.dist=1.8, 
     vertex.label.cex=0.8,
     vertex.size=10,
     vertex.label.color="black")

Now it’s easier for us to see, to which media type belongs each node:

  • Red - TV
  • Yellow - internet
  • Grey - newspapers

Let’s continue our research and make our plot to show audience size:

V(net)$size <- V(net)$audience.size * .4
plot(net, 
     edge.arrow.size=.2, 
     vertex.label.dist=1.8, 
     vertex.label.cex=.8,
     vertex.label.color="black")

And the weight of the edge:

E(net)$width <- E(net)$weight/6

plot(net, 
     edge.arrow.size=.2,
     vertex.label.dist=1.8, 
     vertex.label.cex=.8,
     vertex.label.color="black")

Finally, let us add some explanations for the colors:

graph_attr(net, "layout") <- layout_with_lgl
plot(net, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black")

legend(x=-1.5, y=-1.1, #coordinates
       c("Newspaper","Television", "Online News"), 
       pch=21, 
       pt.bg=colors, 
       pt.cex=2, #size of the circle
       cex=.8, #size of the tex
       ncol=3) #amount of columns in legends

Network and node’s description

Let’s calculate some statistical information:

Density

Density ratio of the number of edges and the number of possible edges(fully connected graph).

edge_density(net)
## [1] 0.1764706

Reciprocity

Reciprocity is the measure of the likelihood of vertices in a directed network to be mutually linked

reciprocity(net) 
## [1] 0.4166667

Clustering coefficient (Transitivity)

Transitivity the probability that the adjacent vertices of a vertex are connected

transitivity(net, type="global")
## [1] 0.372549
transitivity(net, type="local")
##  [1] 0.2142857 0.4000000 0.1153846 0.1944444 0.5000000 0.2666667 0.2000000
##  [8] 0.1000000 0.3333333 0.3000000 0.3333333 0.2000000 0.1666667 0.1666667
## [15] 0.3000000 0.3333333 0.2000000

Diameter

Diameter at network considered as the length of the shortest paths between the nodes. We can use next function to obtain diameter:

diameter(net, directed=T, weights=NA)
## [1] 6
diameter(net, directed=T)
## [1] 75

Or we can obtain the nodes along the first found path of that distance.

diam <- get_diameter(net, directed=T)
diam
## + 7/17 vertices, named, from 8962936:
## [1] Yahoo News          New York Post       AOL.com            
## [4] USA Today           Wall Street Journal MSNBC              
## [7] CNN

Let’s colorize this path on our network:

vcol <- V(net)$color
vcol[diam] <- "green"
ecol <- rep("gray80", ecount(net))
ecol[E(net, path=diam)] <- "green"
# E(net, path=diam) finds edges along a path, here 'diam'

plot(net, 
     vertex.color=vcol, 
     edge.color=ecol, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black", edge.curved=0.3)

Let’s pring edges and nodes to validate:

V(net)[diam]
## + 7/17 vertices, named, from 8962936:
## [1] Yahoo News          New York Post       AOL.com            
## [4] USA Today           Wall Street Journal MSNBC              
## [7] CNN
E(net)[E(net, path=diam)] 
## + 6/48 edges from 8962936 (vertex names):
## [1] Yahoo News         ->New York Post      
## [2] New York Post      ->AOL.com            
## [3] AOL.com            ->USA Today          
## [4] USA Today          ->Wall Street Journal
## [5] Wall Street Journal->MSNBC              
## [6] MSNBC              ->CNN
E(net)$weight[E(net, path=diam)]
## [1]  2 21  4 23  4 21

Node degrees

deg <- degree(net, mode="all")
plot(net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black", 
     vertex.size=deg*3)

Let’s make a histogram of node degree:

deg_df <- as.data.frame(deg, node=names(deg), deg=deg)

ggplot(deg_df) + geom_bar(aes(x=as.factor(deg)), stat="count") + xlab("degree")

Degree distribution

plot(degree_distribution(net, mode='all'))

Degree

head(sort(degree(net, mode='all'), decreasing = T),5)
## Wall Street Journal           USA Today            NY Times 
##                  13                   9                   8 
##     Washington Post       New York Post 
##                   6                   6

Closeness

head(sort(closeness(net, mode='all'), decreasing = T),5)
## Wall Street Journal          Yahoo News            LA Times 
##          0.01470588          0.01408451          0.01298701 
##       New York Post                 BBC 
##          0.01234568          0.01234568

Betweenness

head(sort(betweenness(net), decreasing = T),5)
## Wall Street Journal           USA Today             AOL.com 
##               144.5                97.0                66.0 
##            LA Times          Yahoo News 
##                50.0                41.5

Eigen vector Centrality

head(sort(eigen_centrality(net)$vector, decreasing = T),5)
##            NY Times Wall Street Journal     Washington Post 
##           1.0000000           0.8608478           0.6819049 
##           USA Today         NYTimes.com 
##           0.6314177           0.4769704

Hubs and authorities

Let’s look at hubs and authorities on our data. Hubs mean nodes that outgoing edges. Authorities are nodes that have incoming edges.

hs <- hub_score(net, weights=NA)$vector
as <- authority_score(net, weights=NA)$vector

Plotting hubs:

plot(net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black", 
     vertex.size=hs*50, 
     main="Hubs")

Plotting authorities:

plot(net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black", 
     vertex.size=as*50, 
     main="Authorities")

Distances and paths

Now we will calculate the average path length between each pair of nodes in our graph.

Let’s assume that our graph is undirected:

mean_distance(net, directed=F)
## [1] 2.058824

And also calculate for directed graph:

mean_distance(net, directed=T)
## [1] 2.742188

We also can find distances of the all shortest paths in the graph:

distances(net) # with edge weights
##                     NY Times Washington Post Wall Street Journal USA Today
## NY Times                   0               4                   2         6
## Washington Post            4               0                   4         8
## Wall Street Journal        2               4                   0         4
## USA Today                  6               8                   4         0
## LA Times                   1               3                   1         5
## New York Post              5               7                   3         1
## CNN                        3               5                   1         5
## MSNBC                      4               6                   2         6
## FOX News                   3               1                   3         7
## ABC                        4               5                   2         6
## BBC                        3               5                   1         5
## Yahoo News                 3               5                   1         3
## Google News                9              11                   7         3
## Reuters.com                4               6                   2         6
## NYTimes.com                7               9                   5         1
## WashingtonPost.com        26              28                  24        22
## AOL.com                    8              10                   6         2
##                     LA Times New York Post CNN MSNBC FOX News ABC BBC
## NY Times                   1             5   3     4        3   4   3
## Washington Post            3             7   5     6        1   5   5
## Wall Street Journal        1             3   1     2        3   2   1
## USA Today                  5             1   5     6        7   6   5
## LA Times                   0             4   2     3        2   3   2
## New York Post              4             0   4     5        6   5   4
## CNN                        2             4   0     3        4   3   2
## MSNBC                      3             5   3     0        5   4   3
## FOX News                   2             6   4     5        0   5   4
## ABC                        3             5   3     4        5   0   3
## BBC                        2             4   2     3        4   3   0
## Yahoo News                 2             2   2     3        4   3   2
## Google News                8             4   8     9       10   9   8
## Reuters.com                3             5   3     4        5   4   1
## NYTimes.com                6             2   6     7        8   7   6
## WashingtonPost.com        25            21  25    26       27  26  25
## AOL.com                    7             3   7     8        9   8   7
##                     Yahoo News Google News Reuters.com NYTimes.com
## NY Times                     3           9           4           7
## Washington Post              5          11           6           9
## Wall Street Journal          1           7           2           5
## USA Today                    3           3           6           1
## LA Times                     2           8           3           6
## New York Post                2           4           5           2
## CNN                          2           8           3           6
## MSNBC                        3           9           4           7
## FOX News                     4          10           5           8
## ABC                          3           9           4           7
## BBC                          2           8           1           6
## Yahoo News                   0           6           3           4
## Google News                  6           0           9           4
## Reuters.com                  3           9           0           7
## NYTimes.com                  4           4           7           0
## WashingtonPost.com          23          22          26          23
## AOL.com                      5           1           8           3
##                     WashingtonPost.com AOL.com
## NY Times                            26       8
## Washington Post                     28      10
## Wall Street Journal                 24       6
## USA Today                           22       2
## LA Times                            25       7
## New York Post                       21       3
## CNN                                 25       7
## MSNBC                               26       8
## FOX News                            27       9
## ABC                                 26       8
## BBC                                 25       7
## Yahoo News                          23       5
## Google News                         22       1
## Reuters.com                         26       8
## NYTimes.com                         23       3
## WashingtonPost.com                   0      21
## AOL.com                             21       0
distances(net, weights=NA) # ignore weights
##                     NY Times Washington Post Wall Street Journal USA Today
## NY Times                   0               1                   1         1
## Washington Post            1               0                   1         2
## Wall Street Journal        1               1                   0         1
## USA Today                  1               2                   1         0
## LA Times                   1               1                   1         2
## New York Post              2               3                   2         1
## CNN                        2               2                   1         2
## MSNBC                      2               2                   1         2
## FOX News                   2               1                   2         3
## ABC                        2               1                   1         2
## BBC                        2               2                   1         1
## Yahoo News                 2               2                   1         1
## Google News                3               3                   2         2
## Reuters.com                3               3                   2         2
## NYTimes.com                1               2                   2         1
## WashingtonPost.com         3               4                   3         2
## AOL.com                    2               3                   2         1
##                     LA Times New York Post CNN MSNBC FOX News ABC BBC
## NY Times                   1             2   2     2        2   2   2
## Washington Post            1             3   2     2        1   1   2
## Wall Street Journal        1             2   1     1        2   1   1
## USA Today                  2             1   2     2        3   2   1
## LA Times                   0             2   2     2        1   2   2
## New York Post              2             0   3     3        3   3   2
## CNN                        2             3   0     1        2   1   2
## MSNBC                      2             3   1     0        1   2   2
## FOX News                   1             3   2     1        0   1   3
## ABC                        2             3   1     2        1   0   2
## BBC                        2             2   2     2        3   2   0
## Yahoo News                 2             1   2     2        3   2   2
## Google News                3             2   2     3        4   3   2
## Reuters.com                3             2   1     2        3   2   1
## NYTimes.com                1             1   3     3        2   3   2
## WashingtonPost.com         3             1   4     4        4   4   3
## AOL.com                    3             1   3     3        4   3   2
##                     Yahoo News Google News Reuters.com NYTimes.com
## NY Times                     2           3           3           1
## Washington Post              2           3           3           2
## Wall Street Journal          1           2           2           2
## USA Today                    1           2           2           1
## LA Times                     2           3           3           1
## New York Post                1           2           2           1
## CNN                          2           2           1           3
## MSNBC                        2           3           2           3
## FOX News                     3           4           3           2
## ABC                          2           3           2           3
## BBC                          2           2           1           2
## Yahoo News                   0           1           1           2
## Google News                  1           0           1           3
## Reuters.com                  1           1           0           3
## NYTimes.com                  2           3           3           0
## WashingtonPost.com           2           2           3           2
## AOL.com                      2           1           2           2
##                     WashingtonPost.com AOL.com
## NY Times                             3       2
## Washington Post                      4       3
## Wall Street Journal                  3       2
## USA Today                            2       1
## LA Times                             3       3
## New York Post                        1       1
## CNN                                  4       3
## MSNBC                                4       3
## FOX News                             4       4
## ABC                                  4       3
## BBC                                  3       2
## Yahoo News                           2       2
## Google News                          2       1
## Reuters.com                          3       2
## NYTimes.com                          2       2
## WashingtonPost.com                   0       1
## AOL.com                              1       0

This way we can extract distances to the node we are interested in:

dist_NYT <- distances(net, v=V(net)[media=="NY Times"], to=V(net), weights=NA)

#Plotting
palette <- colorRampPalette(c("dark red", "gold"))
col <- palette(max(dist_NYT)+1)
col <- col[dist_NYT+1]
plot(net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.cex=.8,
     vertex.color=col, 
     vertex.label=dist_NYT,
     vertex.label.color="white")

Now let’s do a bit more complex task. We will find the shortest path (edges) from NY Times to FOX News:

nyt_fox_path <- shortest_paths(net,
    from = V(net)[media=="FOX News"],
    to = V(net)[media=="New York Post"],
    output = "both") # both path nodes and edges

# Generate edge color variable to plot the path:
ecol <- rep("gray80", ecount(net))
ecol[unlist(nyt_fox_path$epath)] <- "green"

# Generate edge width variable to plot the path:
edge_width <- rep(2, ecount(net))
edge_width[unlist(nyt_fox_path$epath)] <- 4

# Generate node color variable to plot the path:
vcol <- V(net)$color
vcol[unlist(nyt_fox_path$vpath)] <- "green"

plot(net,
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black", 
     vertex.color=vcol, 
     edge.color=ecol,
     edge.width=edge_width)

We can find edges that go in or out of our node. Let’s find all edges that are around our FOX News:

fox_edges <- incident(net, V(net)[media=="FOX News"], mode="all")

#Set colors to plot the selected edges.
ecol <- rep("gray80", ecount(net))
ecol[fox_edges] <- "green"
vcol <- V(net)$color
vcol[V(net)$media=="FOX News"] <- "green"

#Plotting
plot(net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black",
     vertex.color=vcol, 
     edge.color=ecol)

Or we can use incident_edges() to select edges of several nodes:

selected_nodes <- V(net)[c(2,6)]
nodes2_edges <- incident_edges(net, selected_nodes, mode="all")

#Set colors to plot the selected edges.
ecol <- rep("gray80", ecount(net))
ecol[E(net) %in% nodes2_edges$s02] <- "blue"
ecol[E(net) %in% nodes2_edges$s06] <- "green"

vcol <- V(net)$color
vcol[selected_nodes[1]] <- "blue"
vcol[selected_nodes[2]] <- "green"

#Plotting
plot(net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black",
     vertex.color=vcol, 
     edge.color=ecol)

Another way of selecting edges would be to use %-% or %->%:

  • E(network)[X %-% Y] takes all edges between X and Y;
  • E(network)[X %->% Y] takes edges going from X to Y.

Let’s see an example:

selected_edges <- E(net)[ V(net)[type.label=="Newspaper"] %->% V(net)[type.label=="Online"] ]

#Set colors to plot the selected edges.
ecol <- rep("gray80", ecount(net))
ecol[selected_edges] <- "green"

#Plotting
plot(net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black",
     edge.color=ecol)

Subgroups and communities

There are several algorithms to detect communities on the plot. Likely, IGraph package contains functions that can be called to execute many of them.

Finding communities based on propagating labels

Label Propagation algorithm assigns labels to the points of data. In the beginning, a subset of the data points has labels. These labels are propagated to the unlabeled points throughout the algorithm.

The following function automatically detects the communities and returns them:

clp <- cluster_label_prop(as.undirected(net))
clp
## IGRAPH clustering label propagation, groups: 4, mod: 0.6
## + groups:
##   $`1`
##   [1] "NY Times"            "Washington Post"     "Wall Street Journal"
##   [4] "USA Today"           "LA Times"            "BBC"                
##   [7] "NYTimes.com"        
##   
##   $`2`
##   [1] "New York Post"      "WashingtonPost.com" "AOL.com"           
##   
##   $`3`
##   [1] "CNN"      "MSNBC"    "FOX News" "ABC"     
##   + ... omitted several groups/vertices
plot(clp, net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black")

Louvain algorithm for communities detection:

Let’s use another algorithm to obtain communities. Louvain algorithm consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities. The second step is the definition of a new network in terms of the communities found in the first step.

lou <- cluster_louvain(as.undirected(net))
lou
## IGRAPH clustering multi level, groups: 4, mod: 0.6
## + groups:
##   $`1`
##   [1] "CNN"      "MSNBC"    "FOX News" "ABC"     
##   
##   $`2`
##   [1] "Yahoo News"  "Google News" "Reuters.com"
##   
##   $`3`
##   [1] "NY Times"            "Washington Post"     "Wall Street Journal"
##   [4] "USA Today"           "LA Times"            "BBC"                
##   [7] "NYTimes.com"        
##   + ... omitted several groups/vertices
#Plotting
plot(lou, net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black")

Walktrap community detection algorithm

The idea of this method relies on using random walks. These walks are more likely to stay inside the community as there can be only a few edges that are going outside, to another community.

wtc <- walktrap.community(as.undirected(net))
wtc
## IGRAPH clustering walktrap, groups: 4, mod: 0.6
## + groups:
##   $`1`
##   [1] "NY Times"            "Washington Post"     "Wall Street Journal"
##   [4] "USA Today"           "LA Times"            "BBC"                
##   [7] "NYTimes.com"        
##   
##   $`2`
##   [1] "CNN"      "MSNBC"    "FOX News" "ABC"     
##   
##   $`3`
##   [1] "New York Post"      "WashingtonPost.com" "AOL.com"           
##   + ... omitted several groups/vertices
#Plotting
plot(wtc, net, 
     edge.arrow.mode=0, 
     edge.arrow.size=.2,
     vertex.label.dist=2, 
     vertex.label.cex=.8,
     vertex.label.color="black")