Correlation network representation

Question

I have troubles understanding the figure 10 of this paper:

Similarity network of participating methods for BPO. Similarities are computed as Pearson’s correlation coefficient between methods, with a 0.75 cutoff for illustration purposes. A unique color is assigned to all methods submitted under the same principal investigator. Not evaluated (organizers’) methods are shown in triangles, while benchmark methods (Naïve and BLAST) are shown in squares. The top-ten methods are highlighted with enlarged nodes and circled in red. The edge width indicates the strength of similarity. Nodes are labeled with the name of the methods followed by “-team(model)” if multiple teams/models were submitted

This study calculated the pairwise Pearson correlation between methods on a common set of gene-concept pairs and then visualized these similarities as networks. How is the place of the methods determined in this visualization?

For example, the relationship between Pannzer (1) and Pannzer (2) is greater than the relationship between Pannzer (1) and Pannzer (3), but the Panzer (1) and Panzer (3) are closer to each other than Panzer (1) and Panzer (2). Is this proximity positioning random? or is there a certain rule?

Where is this image from? If it's from a published article, please include the reference both so we can look it up and so it is properly cited, and so we can get a better quality image. — terdon, Dec 10 '17 at 16:25
This image belongs to article that an expanded evaluation of protein function prediction methods shows an improvement in accuracy — AyşeBanu, Dec 10 '17 at 18:30
Thanks, but we needed something like the edit Llopis did. Whenever you show something from a paper you must always include the source. — terdon, Dec 10 '17 at 18:43
I did not know that I needed to add the source. Thank you for saying that to me — AyşeBanu, Dec 10 '17 at 20:40

score 1 · Answer 1 · answered Dec 10 '17 at 09:43

When plotting networks where only the weight of an edge is known like this one. The relative position between elements of the networks is irrelevant. The only informative part of the graph is (usually) the size of the edges, where is the thicker means more weight. This isn't a PCA, so the distance between different nodes is not relevant.

For those reasons there are several methods to build these networks, some of them give more emphasis to represent better the degree of the network or other characteristics of the networks.

I consider this representations bad for informing about the networks, because the cut off is arbitrary and the distance between nodes is not informative. One could first do a PCA or MDS and then add the edges between the nodes, this way the relative distance between the nodes would be meaningful as well as the edges. Or one could also use a heatmap to represent all the correlations between the nodes.

score 0 · Answer 2 · answered Dec 10 '17 at 16:28

The description you quote only states that edge thickness has relevance, not edge length. Usually with this sort of image, the program generating the graph (most probably Cytoscape) will simply arrange the nodes to maximize legibility.

While the edge length can, sometimes, be relevant it usually isn't and instead it is the thickness of the edges and the size of the nodes that carry the information. The positioning of the nodes is most likely arranged to make the image as clear as possible.

Correlation network representation

2 Answers2