7

I simulated some graph network data (~10,000 observations) in R and tried to visualize it using the visNetwork library in R. However, the data is very cluttered and is very difficult to analyze visually (I understand that in real life, network data is meant to be analyzed using graph query language).

For the time being, is there anything I can do to improve the visualization of the graph network I created (so I can explore some of the linkages and nodes that are all piled on top of each other)?

Can libraries such as 'networkD3' and 'diagrammeR' be used to better visualize this network?

I have attached my reproducible code below:

library(igraph)
library(dplyr)
library(visNetwork)

#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)

#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)

#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")

graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
graph

plot(graph)

library(visNetwork)
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]

visNetwork(nodes, edges) %>%   visIgraphLayout(layout = "layout_with_fr") %>%
    visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% 
    visInteraction(navigationButtons = TRUE)

Thanks

stats_noob
  • 3,127
  • 2
  • 8
  • 27
  • Have you checked out ggiraph? – Roger-123 Nov 05 '20 at 23:21
  • @Roger-123: thank you for your reply! I briefly looked at 'ggraph', but i wasn't sure if it would necessarily make the visualization of a large network graph (like mine) any better. Do you have any suggestions? – stats_noob Nov 05 '20 at 23:39
  • Does this answer your question? [How to visualize a large network in R?](https://stackoverflow.com/questions/22453273/how-to-visualize-a-large-network-in-r) – majom Nov 06 '20 at 00:31
  • @majom: thank you for your reply! As per the link you provided, I tried using NetworkD3 and couldn't get it to work. Do you have another example? – stats_noob Nov 06 '20 at 01:54
  • @majom: i will try this code: plot(simplify(g), vertex.size= 0.01,edge.arrow.size=0.001,vertex.label.cex = 0.75,vertex.label.color = "black" ,vertex.frame.color = adjustcolor("white", alpha.f = 0),vertex.color = adjustcolor("white", alpha.f = 0),edge.color=adjustcolor(1, alpha.f = 0.15),display.isolates=FALSE,vertex.label=ifelse(page_rank(g)$vector > 0.1 , "important nodes", NA)) – stats_noob Nov 06 '20 at 01:55

2 Answers2

10

At the request of the OP, I am applying the method used in a previous answer Visualizing the result of dividing the network into communities to this problem.

The network in the question was not created with a specified random seed. Here, I specify the seed for reproducibility.

## reproducible version of OP's network
library(igraph)
library(dplyr)

set.seed(1234)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)

#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)

#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")

graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)

As noted by the OP, a simple plot is a mess. The referenced previous answer broke this into two parts:

  1. Plot all of the small components
  2. Plot the giant component

1. Small components Different components get different colors to help separate them.

## Visualize the small components separately
SmallV = which(components(graph)$membership != 1)
SmallComp = induced_subgraph(graph, SmallV)
LO_SC = layout_components(SmallComp, layout=layout_with_graphopt)
plot(SmallComp, layout=LO_SC, vertex.size=9, vertex.label.cex=0.8, 
    vertex.color=rainbow(18, alpha=0.6)[components(graph)$membership[SmallV]])

The Small components

More could be done with this, but that is fairly easy and not the substance of the question, so I will leave this as the representation of the small components.

2. Giant component
Simply plotting the giant component is still hard to read. Here are two approaches to improving the display. Both rely on grouping the vertices. For this answer, I will use cluster_louvain to group the nodes, but you could try other community detection methods. cluster_louvain produces 47 communities.

## Now try for the giant component
GiantV = which(components(graph)$membership == 1)
GiantComp = induced_subgraph(graph, GiantV)
GC_CL = cluster_louvain(GiantComp)
max(GC_CL$membership)
[1] 47

Giant method 1 - grouped vertices
Create a layout that emphasizes the communities

GC_Grouped = GiantComp
E(GC_Grouped)$weight = 1
for(i in unique(membership(GC_CL))) {
    GroupV = which(membership(GC_CL) == i)
    GC_Grouped = add_edges(GC_Grouped, combn(GroupV, 2), attr=list(weight=6))
} 

set.seed(1234)
LO = layout_with_fr(GC_Grouped)
colors <- rainbow(max(membership(GC_CL)))
par(mar=c(0,0,0,0))
plot(GC_CL, GiantComp, layout=LO,
    vertex.size = 5, 
    vertex.color=colors[membership(GC_CL)], 
    vertex.label = NA, edge.width = 1)

Giant component with grouped vertices

This provides some insight, but the many edges make it a bit hard to read.

Giant method 2 - contracted communities
Plot each community as a single vertex. The size of the vertex reflects the number of nodes in that community. The color represents the degree of the community node.

## Contract the communities in the giant component
CL.Comm = simplify(contract(GiantComp, membership(GC_CL)))
D = unname(degree(CL.Comm))

set.seed(1234)
par(mar=c(0,0,0,0))
plot(CL.Comm, vertex.size=sqrt(sizes(GC_CL)),
    vertex.label=1:max(membership(GC_CL)), vertex.cex = 0.8,
    vertex.color=round((D-29)/4)+1)

Giant component with contracted communities

This is much cleaner, but loses any internal structure of the communities.

G5W
  • 34,378
  • 10
  • 39
  • 71
  • thank you so much for your reply! I will spend some time trying to understand the answer you kindly replied. Some questions I had: – stats_noob Nov 07 '20 at 18:45
  • if I change the specification of the original data, why doesnt the "plot" command work?: x5 – stats_noob Nov 07 '20 at 18:46
  • I really like what you did at the end, plotting the communities together. I saw in your other answer, you used the Girvan-Newman Algorithm instead of the Louvain Clustering Algorithm. Do you have any preferences out of either of these? – stats_noob Nov 07 '20 at 18:56
  • When you wrote this line of code here: plot(SmallComp, layout=LO_SC, vertex.size=9, vertex.label.cex=0.8, vertex.color=rainbow(18, alpha=0.6)[components(graph)$membership[SmallV]]) or plot(CL.Comm, vertex.size=sqrt(sizes(GC_CL)), vertex.label=1:max(membership(GC_CL)), vertex.cex = 0.8, vertex.color=round((D-29)/4)+1) Is it possible to pass these "plot" commands through visNetwork()? So that these sub graphs becomes interactive – stats_noob Nov 07 '20 at 19:01
  • 1
    I do not understand the first question. It looks to me like the specification of the data is the same and the plot works fine for me. No strong preference between Girvan-Newman Algorithm instead of the Louvain Clustering Algorithm, but in the other question the OP had used Girvan-Newman so I continued with that. Louvain is faster. :-) – G5W Nov 07 '20 at 19:02
  • 1
    Regarding visNetwork - sure. just do what you did in the question, except use `SmallComp` instead of `graph` – G5W Nov 07 '20 at 19:07
  • here is what I meant: https://shrib.com/#Desmond56kNd3a (I posted the code here). When I change the original data used for the network graph, I run into an error. I suspect that this is because of the "vertex.color = rainbow" command? – stats_noob Nov 07 '20 at 19:10
  • @GW5: re: visNetwork - thank you, this works! nodes – stats_noob Nov 07 '20 at 19:13
  • 1
    When I ran your code, I got the same error because I got NO small components. All vertices were in in the big component. – G5W Nov 07 '20 at 19:15
  • thanks again - I will keep that in mind. Your help has been invaluable. if you don't mind me asking: do you use/have you used network and graph clustering methods in the past on real data? what have your experiences been, did these methods prove to be useful? I have started learning about these methods and see potential applications in finding the "shortest path" between two nodes, seeing if two nodes (or a group of nodes) share a relationship, finding community clusters as well as observing general centrality, connectivity and closeness. What other common applications can there be? Thanks. – stats_noob Nov 07 '20 at 19:22
  • That is a bit much to respond in a comment. There are many web sites that talk about this. Try [Intro to Social Network Methods](http://faculty.ucr.edu/~hanneman/nettext/Introduction_to_Social_Network_Methods.pdf) and [Network Data Repository](http://networkrepository.com/networks.php) and [SNAP](https://snap.stanford.edu/data/) – G5W Nov 07 '20 at 19:28
  • Thanks so much - I will start to read these right away. I am still a bit confused how to input these subgraphs into visNetwork(). Here is my understanding as to how this should be done: https://shrib.com/#Prince3vmQ6Rj Could you please take a look when you have some time? Thanks again. – stats_noob Nov 07 '20 at 19:49
  • The first link "intro to social network methods" is really good! – stats_noob Nov 07 '20 at 21:33
  • A couple of others you might like [Shizuka Lab](https://sites.google.com/site/daishizuka/toolkits/sna) and [AnalyticTech](http://www.analytictech.com/networks/) – G5W Nov 07 '20 at 21:45
  • thanks for everything. Please let me know, (when you get a chance), if my code is correct for passing your code through visnetwork is correct: shrib.com/#Prince3vmQ6R – stats_noob Nov 07 '20 at 22:12
  • Based on your answer here https://stackoverflow.com/questions/62553280/visualizing-the-result-of-dividing-the-network-into-communities I tried running the similar code on my real data : COMP = components(graph) table(COMP$membership) And I got the following results : 1,50; 2,49; 4;44; 25,39; 16,33 3;32... 1239,6; 1615,6;2536,6; ...518,5; 519,5 etc also : max(GC_CL$membership) =1 Based on this, does it seem worth it to continue this type of analysis? (I guess this means there isnt a really a "big" component?) – stats_noob Nov 08 '20 at 00:49
  • I also could not run the code from the section : Giant method 2 - contracted communities. I got an error message - Error in symbols(x = coords[,1], y= coords[,2], bg = vertex.color, ; numerical color values must be >= 0; found -6 – stats_noob Nov 08 '20 at 00:59
  • I get something like https://i.stack.imgur.com/8I9WK.png when I run: GiantV = which(components(graph)$membership > 3000) GiantComp = induced_subgraph(graph, GiantV) GC_CL = cluster_louvain(GiantComp) max(GC_CL$membership) LO = layout_with_fr(GC_Grouped) colors – stats_noob Nov 08 '20 at 01:06
  • GiantV = which(components(graph)$membership == 1) ... this means the first component, not component size =1? –  Nov 26 '20 at 04:51
  • Yes. The first component. In this case it was also the largest one. It would be better to actually check which component is the biggest. – G5W Nov 26 '20 at 15:26
1

Just a tip for 'real-life'. The best way to deal with large graphs is to either 1) filter the edges you are using by some measure, or 2) use some related variable as weight.

benjasast
  • 77
  • 5
  • 1
    Thank you for your reply! Can you recommend any tutorials in R for this? – stats_noob Nov 12 '20 at 06:03
  • 1
    Sure, if you are not familiar with the Tidyverse I would use Hadley's R4DS https://r4ds.had.co.nz . Afterwards it will be very easy for you to use ggraph, you can find the vignette with examples at: https://github.com/thomasp85/ggraph – benjasast Nov 12 '20 at 14:47