1

I am trying to run a linear regression with a dummy variable on NBA statistics with NBA Salaries as the $y$ variable, and different performance statistics as the $X$ variables. I have already ran a linear regression and found $PPG$ and $RPG$ are the only 2 significant results in determining player salary. However looking at my graphs, there only seems to be correlation between increased $PPG$ and higher salary after a player scores over 10 $PPG$, before this there is just a large chaotic cluster of data points.

To look at if there is differing determinants before and after the point of 10 $PPG$, I used a dummy variable called $PPGDummy1$ which equals 1 when player's $PPG$ is bigger than 10 and equals 0 when it is less than 10. I have run the regression for this but have no clue how to interpret the results from this regression? Here is my code for the regression:

lm2 <- lm(log(Salary) ~ PPGDummy1 + PPG + APG + RPG + SPG + BPG + FG + THREEPG + FT + Age, data = Econ_III_Data_Set)

Here is the section of results that it produces:

         Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15.239385   0.555064  27.455  < 2e-16 ***
PPGDummy1    0.317359   0.110765   2.865  0.00431 ** 
PPG          0.052420   0.011527   4.548 6.55e-06 ***
APG          0.038682   0.025772   1.501  0.13390    
RPG          0.084863   0.021474   3.952 8.67e-05 ***
SPG          0.143407   0.111059   1.291  0.19710    
BPG         -0.125260   0.118076  -1.061  0.28919    
FG           0.267770   0.669408   0.400  0.68929    
THREEPG     -0.029914   0.328201  -0.091  0.92741    
FT          -0.556308   0.368705  -1.509  0.13187    
Age         -0.009873   0.011326  -0.872  0.38371  

Unsure how to interpret the $p$-value and estimate for the $PPGDummy1$ variable in the results. Thanks in advance for your help.

1 Answers1

1

From the looks of it, you are including both PPG and PPGDummy1 in your model, which imposes a potentially unusual assumption on the relationship between PPG and expected log-salary. You have a line with slope of 0.052420 but with a discontinuous jump of 0.317359 at PPG = 10. Change in expected log-salary per increase in PPG

Based on your description of the residual analyses, it sounds like you're seeing a non-linear relationship between PPG and the expected change in log-salary. And it looks like the p-values from your table are saying something similar: the log-salaries are greater than a linear relationship with PPG can explain.

If all you are trying to say is that changes in large PPGs matter more than changes in small PPGs, perhaps this is good enough. On the other hand, you probably do not believe that the plot above, which represents your fitted model, is capturing well the true relationship between log-salary and PPG. So, if you are interested in mathematically characterizing the relationship rather than simply describing it, consider a more flexible model like having polynomial coefficients for PPG or using a generalized additive model.

psboonstra
  • 2,155