0

My data is very zero inflated - even when carrying out previous filtering, such as getting rid of data that has very low counts and have values in more than 60% of samples.

However, the 0 values still pose a problem when trying to CLR transform prior to correlating this relative data to some clinical metadata parameters.

I understand you can add a small constant to these 0 values or 1 prior to a typical log transform; however I would be interested to read any comments on what might be best practice prior to the CLR transform or other approaches.

EDIT:

The data I am working with is multi omics.

EDIT 2:

Different approaches to deal with 0s would either be imputing them using a package such as zCompositions or adding the value 1 to each data point.

Two papers that clr-transform both the datasets which can be regarded as relative in nature (in these examples microbiome and metabolome).

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671389/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8165185/

In other examples ( where its not clear how they transform the metabolome variable).

https://www.nature.com/articles/s41467-023-36825-1#Sec12

aim6789
  • 55
  • 2
    Adding any constant is arbitrary. A better alternative is the ILR: see https://stats.stackexchange.com/questions/259208. There's no getting around the zero inflation, though, because any transformation will map all the zeros to the same constant. That's of no importance in measuring correlations. The choice of transformation, though, is critical. – whuber Sep 18 '23 at 19:34
  • I think when you have to do so much to your data, you should probably be using a different method (in this case, not Speamran). If you tell us what you are actually trying to do, we might be able to help. – Peter Flom Sep 18 '23 at 20:01
  • @PeterFlom thanks for your comment, please see edit - its working with omic type data, specifically microbiome and metabolome. The scientific community on a whole is working towards developing protocols/methods for this , but correlation based methods are one approach that has been tried. Bayesian and/or machine learning approaches are also being studied. – aim6789 Sep 19 '23 at 20:17
  • Can we have a little bit more detail, e.g. a link to a paper that uses these techniques on data where the zeros are not problematic? I can't quite figure out what you're correlating with what: you want to do a CLR transform on some set of data and then compute correlations between the transformed values and some other variables? (This seems a little tricky because the individual CLR components will be hard to interpret ...) – Ben Bolker Sep 26 '23 at 23:45
  • @BenBolker please see edit. Its difficult to find a paper where the CLR is applied to data where 0 isn't a problem, in the first two papers they impute or add a pseudocount (e.g. 1) to the zero inflated microbiome OTU/ASV data before CLR transform. Examples in the literature either CLR both variables before correlating them or apply CLR to say microbiome data and then normalise and correlate this with metabolome data or a different variable that may have been transformed/scaled in a different way e.g. log and pareto scaling. No general consensus that I can find yet. – aim6789 Sep 29 '23 at 14:51

0 Answers0