0

I work at a fundraising organization that runs campaigns in different workplaces. In these workplaces our fundraising team employs a variety of tactics (prizes, events, etc.) to boost donations. After the annual campaign cycle I have a set of data for each workplace (500+) that contains the tactics used (binary - tactic present vs absent) and continuous outcome data such as participation rate for each campaign. The data might look like this:

Part Rate Tactic1 Tactic2 Tactic3...
0.15      1       0       1
0.20      0       0       1
0.17      1       1       1
...

In reality however, we monitor 15+ different tactics. Also note that we might employ several tactics at the same workplace. There is med-high correlation among some of the tactics.

It would be nice to be able to say what tactics have the most influence on the outcome (participation rate in this example). However, with the degree of multicollinearity between IVs, I wonder if I have to give up on this project? Correlation measures between IVs range from 0.4-0.7.

Can anyone suggest an analytical approach that might provide some fruitful knowledge out of these data? Broad strokes are OK as long as it addresses the specific problems inherent in this example. FYI, I have intermediate level undergrad stats and I am fluent in R.

jtdoud
  • 31
  • 2

1 Answers1

0

You may try to use "Ridge Regression" method to deal with multicolinearity effect. Regular regression assumes the independence of dependent variables between each other. When correlation exists between these variables the solution matrix becomes singular or close to singular. "Ridge Regression" overcomes this problem by modifying the objective function. It basically penalizes the size of regression coefficients: $$\bigg(\vec y - X \vec\beta\bigg)^2+\alpha X$$ The solution is given by

$$\vec\beta=\bigg(X^TX+\alpha I\bigg)^{-1}X^T\vec y$$

As you can see directly from the formulation of the objective function it foreces the estimators towards zero and introduces bias.

AnilB
  • 101