2

I've heard that one solution to analyzing compositional data (in my case five predictors that are proportions summing to 1) is to simply remove the intercept.

This seems like a much simpler solution than transforms, and allows all predictors to be included.

I'd like to double-check that this is an acceptable approach. If so, can you explain why this works? The reference I was pointed to is a textbook that isn't available online.

Edit: This is the textbook I was pointed to. https://onlinelibrary.wiley.com/doi/book/10.1002/9781118204221

Also, to give a bit more detail, my DV is accuracy in response to each item. My predictors are proportions reflecting the makeup of the items.

Dave
  • 2,611
  • 1
    Can you still edit your post to include what textbook that is? The claim sounds extremely fishy to me, but that may be down to what "analyze" means. Forecasting or predicting compositional data will, for instance, certainly not work by simply removing the intercept (predictions may easily be outside the unit interval and not sum to one), but perhaps "analyzing" refers to some asymptotic significance testing or similar (and even then that looks rather dubious). – Stephan Kolassa Mar 15 '23 at 17:35
  • This will become much more complicated if you have two sets of predictors in your model each summing to $1$ as you cannot remove the intercept twice – Henry Mar 15 '23 at 17:42
  • Because these are predictors and because you state the proportions sum to 1 -- not just are expected to sum to 1 or approximately sum -- then you have lost no information by removing one of them. The benefit is doubtful, though, unless you have a rationale to make a non-arbitrary choice of which one to remove. Removal is a linear transformation from the simplex in $\mathbb R^5$ to a region in $\mathbb R^4.$ An orthogonal transformation might be a better choice because it preserves lengths, angles, and distances. See https://stats.stackexchange.com/questions/259208 for details. – whuber Mar 15 '23 at 18:06
  • @StephanKolassa I've added the textbook in an edit as well as some more details of the analysis. – Dave Mar 15 '23 at 18:25
  • 1
    @whuber Thanks for the reply! The reason I was excited about the possibility of removing the intercept is that I'd like to be able to interpret effects for all five predictors, and don't have a natural reference category to remove. I've been hesitant about the ILR transform because the interpretation doesn't seem very intuitive to me. That's why I was drawn to being able to just remove the intercept and keep all five predictors. – Dave Mar 15 '23 at 18:27
  • When these are your only predictors, that approach makes sense and is used. But be aware that the interpretation is a little delicate due to the necessary correlations among the predictors. – whuber Mar 15 '23 at 18:42
  • @whuber Just so I'm clear, when you say that approach makes sense and is used, you're referring to (in my case) including all five variables and omitting the intercept? And thanks for the warning, I will be sure to check VIF for the models. – Dave Mar 15 '23 at 19:08
  • The VIF doesn't tell you much here. Pace the current answer, when you code each of the variables separately and exclude an intercept, the design matrix is usually of full rank. – whuber Mar 15 '23 at 20:36

0 Answers0