2

This is an imaginary question. I am asking out of curiosity.

Say we have data on how fast a person can run 100 meters. We have also measured this person's weight, and also how cold the temperature was, or what season we were in (let's say we gathered info all year, so all seasons are represented).

It is known that both covariates have an effect on run-time. However, what if weight was caused by season? For example, what if you were much, much more likely to weigh a lot during winter, than in summer?

In that case, what model do you go for? Effects from weight and season? Or just season alone?

Marke
  • 171
  • To master methods of modeling causal pathways or cause-and-effect dynamics is a years-long process that could take you into a vast literature on causality, experimental vs. observational research, regression, path analysis, structural equation modeling, and more. – rolando2 Feb 22 '17 at 19:10

2 Answers2

0

In general, $S$ is NOT independent of $W$, given the observation of $T$. That is, $p(S|W,T)$ is not equal to $p(S|T)$.

Proof: Just for visualization of dependencies, here is the corresponding Bayesian network graphical representation:

$ T \rightarrow W$

$|~~~~~~|$

|>$S$<|

The dependencies you described gives:

$$p(S,W,T)=p(S|W,T)p(W|T)p(T)$$

$$p(S|T) = \sum_W{p(S,W|T)} = \sum_W{p(S|W,T)p(W|T)}$$

As we can see, $p(S|T)$ is not equal to $p(S|W,T)$, unless the relationship $p(W|T)$ is deterministic, i.e., $W$ has absolutely zero entropy given $T$.

Intuitive Example:

Let's say $T=\text{hot}$, $W=\text{heavy}$, and $p(W=\text{heavy}|T=\text{hot}) = 0.1$.

1) $p(S=\text{fast}|T=\text{hot})=\sum_W{p(S=\text{fast}|W,T=\text{hot})p(W|T=\text{hot})}$ (Assumes that we have no idea if $W=\text{light}$ or $\text{heavy}$, so we have to consider both possibilities and take a weighted average between $p(S=\text{fast}|W=\text{heavy},T=\text{hot})$ and $p(S=\text{fast}|W=\text{light},T=\text{hot})$.)

2) $p(S=\text{fast}|W=\text{heavy},T=\text{hot})$ (know that $W=\text{heavy}$)

If you pretend that you do not know $W$, i.e., using (1), your result will be misled by the scenario that could have occurred but actually did not ($p(S=\text{fast}|W=\text{light},T=\text{hot})$).

However, knowing that $W=\text{heavy}$, you can rule out that 90% chance of assuming the model $p(S=\text{fast}|W=\text{light},T=\text{hot})$.

Vicon
  • 11
  • 5
0

Here you just have two correlated (or related otherwise) predictors. This is not ideal, but it's very usual and it isn't a great problem. Depending on how strongly related are your predictors, you may want to include both in your model or just one of them. The way you decide depends on the kind of model you are adjusting, but you can google "variable selection" or "feature selection". For lineal regression, you can check on-line chapter 10 of this book.

I'd like to end with a couple of extreme examples:

  • We are measuring people. Our response is height and our predictors are weight in kg an weight in pounds. Our predictors are totally correlated and carry exactly the same information. We should drop one of them.
  • We are measuring movements of the Leaning Tower of Pisa. Our response is tilt of the tower and our predictors are downward movement measured at each side at the base of the tower. Both predictors are related (we should expect downward movements at both sides at the same time) but what causes tilt is the difference of those movements. Therefore, even if they are related, we should keep both predictors in the model.
Pere
  • 6,583