0

I am currently making PCoA plots on Presence/Absence community data. My rows are populated with samples and my column headings are different taxa that were detected. However, since the experiment is a degradation experiment, some of the samples have rows filled with zeros (absences). The only way the distance matrix calculation will work is if I remove these zero-filled rows. However, since these rows are biologically meaningful (it means nothing was present after decay) and not just missing data, I was wondering if there was a way to keep them and avoid the "NaN" error I would get otherwise.

I am using the package 'vegan' and the function 'vegdist' to calculate my distance matrix.

Here is an example of the code:

distance_matrix <- vegdist(data, method = "jaccard", binary=TRUE)
pco <- pco(distance_matrix, negvals = "zero")

Thanks in advance for your help and/or suggestions!

  • Is a PCoA plot a biplot computed via SVD of a distance matrix? It has been too long since I took a course in molecular evolution and ecology. – Galen Apr 01 '22 at 19:35
  • And by SVD I mean singular value decomposition. – Galen Apr 01 '22 at 19:42
  • Yes, it is! The issue is with the distance matrix not being able to compute (I originally wrote PCoA plot and have corrected that). – ramateur Apr 01 '22 at 20:05
  • Looking at the definition of the Jaccard distance as the complement of the Jaccard index, my guess is that the NaN values are coming from a divide-by-zero issue when the union is empty. – Galen Apr 01 '22 at 20:08
  • One option for you might be to use a different metric. – Galen Apr 01 '22 at 20:12
  • Yes, that is where the problem is coming in. I guess in this case I have no choice but to remove those zero-rows if I want to use the Jaccard index? – ramateur Apr 01 '22 at 20:42
  • Yes. If you transform your data in some way to handle those zeros that is equivalent to using a slightly different metric by function composition. – Galen Apr 01 '22 at 20:45
  • See this question for someone with a similar issue, also using vegan: https://stackoverflow.com/questions/71651515/how-to-include-plots-rows-with-zero-values-in-the-presence-absence-community – rw2 Apr 02 '22 at 19:15

1 Answers1

0

Computing the Jaccard distance between two sets requires that the union be non-empty.

$$d_{\operatorname{Jaccard}}(A,B) = 1 - \frac{|A \cup B| - |A \cap B|}{|A \cup B|}$$

If the union $A \cup B = \emptyset$, then $|A \cup B| = 0$. In that case you will have an undefined value. If $A = B = \emptyset$ then the numerator will also be zero, giving an indeterminate form.

R is likely assigning divide-by-zero cases to NaN.

If the Jaccard distance must be used, then any cases that result in an empty union have to be filtered out. Another approach is to use a different metric that can handle both inputs being zero/empty.

Galen
  • 8,442