How to deal with stress over 0.2 in NMDS in large dataset

Question

I am analysing a large dataset (2000 rows by 250 columns) of the presence of species in several locations over the last 20 years. I have conducted a NMDS in order to identify differences between the main two type of forest. The function ends up converging but giving a stress of 0.25. Pretty much everywhere I look says that more than the commonly accepted limit of 0.2 is a bad representation. I've seen that when dealing with such massive databases the stress limit of 0,2 might not be a good way to measure goodness of fit either. So my questions are:

Are my result still usable? Adonis gives a p-value of less than 0.05 and in general, everything seems to point out that the two sites are different in species composition.
Is there another way to measure the fit of the model?
Any other alternatives?

The code:

metaMDS(bird.matrix, distance = "bray", k = 3, maxit = 999,   
    trymax = 10000, wascores = TRUE,noshare = 0.1, 
    previous.best = nmds)

What is your goal? Do you want to know whether all the sites are different or just whether two of them are different? What type of information do you expect to obtain from using NMDS? Also, what is "the model" you refer to in your second question? — frank, Jul 14 '22 at 11:34
Hi Frank, thanks for your reply. I am testing the difference in community composition between the two types of forest (several sites whin each forest, but not interested in knowing how these differ). for that, I've conducted adonis, but I still want to visualize this with NMDS. By model, I meant the NMDS, sorry. — Aurora Tarodo, Jul 14 '22 at 17:06
You are not saying what specifically NMDS method/algorithm and what kind of stress you are using. Next, (N)MDS is a technique for square distance matrices, but you say your input is a rectangular dataset; how then do you obtain the distance matrix? — ttnphns, Jul 15 '22 at 17:21
metaMDS(bird.mat, distance = "bray", k=3, trymax - 10000). Of course, I created the distance matrix before computing the analysis. I am not using any kind of stress, when you compute the code from above you get a result that shows you the stress in the number of dimensions that you run it and how many tires needed until converging. — Aurora Tarodo, Jul 16 '22 at 13:54

score 0 · Accepted Answer · answered Jul 15 '22 at 04:41

IIUC, you have two types of forests, and each of the 2000 rows in your dataset belongs to one of those types.

In principle, it is appropriate to use adonis() for testing whether the two types of forest are different, and with a p-value of less than 0.05, you can be quite confident that they indeed differ. However, the problem is that you have quite a lot of data (2000 rows), so that makes it easy for adonis() to find some differences, even if they are not very relevant in your eyes. The two types of forest are probably not one hundred percent identical and giving lots of data to the test, it will always find that they differ. That is why, in general, one should be cautious with applying significance tests to large datasets.

As far as NMDS is concerned, this is more of a visualization tool, so you can get some feeling about the relative positioning. This is often quite helpful to build intuition, e.g. to see how well the data of the two types of forests are separated, but it doesn't give you "concrete evidence". Note, that it is a map from your original space of 250 dimensions to the two-dimensional space. This is inevitably losing lots of information and it is difficult/impossible to figure out, what exactly is lost.

Your three questions:

Using adonis() is, in principle, correct, but, as explained above, it might be "too good" to be of use. And it is totally fine to use NMDS as visualization help.
Stress is the standard measure for NMDS. And even if you found some other measure, there is no reason why this other measure should be better than the stress.
As far as alternatives are concerned, the above problem is present with all significance tests. In your case, you would have to properly describe how much two types of forest can differ for them to still be considered "sufficiently equal". This is probably difficult. One approach might be to find several different types of forests, some of which you know are definitely different, and then compare the difference between those with the difference of the two you are interested in. And there are also other visualization tools, MDS, t-SNE, or UMAP, which however have some requirements on the data, and given that you are using NMDS, I guess those are not satisfied.

How to deal with stress over 0.2 in NMDS in large dataset

1 Answers1