Yesterday, I set up a topic outlining the problem I am currently working on.
After receiving many interesting responses, I added linear regression to my results, following this suggestion.
My research area consists of a river and two rows of wells, one closer to and one further away from the river, in a north-south direction.
In my research problem, I want to show using statistics that the river I am studying significantly affects the water quality of the row of wells closer to it (a process called bank filtration is at work). At the same time, the second row of wells is also influenced to some extent by the river, but not as much as the first row. To do this, I decided to collate laboratory physicochemical data (e.g. chlorides or sulphates in [mg/L]) from the river and the two well rows and compare them statistically.
As you can see in my earlier post, I decided to use statistical tests. Still, after the suggestion today, I also did a linear regression for the three groups of results divided by location (river and two wells rows).
Below I insert an example output for one group of parameters from the suggested method in Python:
"""
OLS Regression Results
==============================================================================
Dep. Variable: chloride R-squared: 0.487
Model: OLS Adj. R-squared: 0.480
Method: Least Squares F-statistic: 65.04
Date: Tue, 22 Aug 2023 Prob (F-statistic): 1.38e-20
Time: 20:36:58 Log-Likelihood: -372.99
No. Observations: 140 AIC: 752.0
Df Residuals: 137 BIC: 760.8
Df Model: 2
Covariance Type: nonrobust
=====================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------
Intercept 14.6316 0.806 18.162 0.000 13.038 16.225
location[T.a] -0.2420 0.900 -0.269 0.788 -2.021 1.537
location[T.b] 7.0957 0.964 7.361 0.000 5.189 9.002
==============================================================================
Omnibus: 41.716 Durbin-Watson: 1.416
Prob(Omnibus): 0.000 Jarque-Bera (JB): 121.766
Skew: 1.122 Prob(JB): 3.62e-27
Kurtosis: 6.979 Cond. No. 5.84
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""
I want to ask whether a) these results seem to make sense and b) which of the results are worth including, e.g. in the Table or supplementary material in the paper for which I am performing these analyses?
Do I correctly understand that in a nutshell:
The R-square is 0.487, which means that the model explains about 48.7% of the variability in the data.
The F-statistic is 65.04, and the associated p-value is a very low 1.38e-20, suggesting that at least one of the model coefficients is significant.
Since p for T.b was small (<0.05) and for T.a was high, it means that between the river and row A closer to it, there are no significant differences in means and between the river and row B, there are.
Did I get it right? Also, which other results from this output are worth noting? Also, keep in mind that if I leave the OLS in the paper, I would need to do the same for 17 other groups, so I need to be concise with the number of results.
And the other question, if I may:
Do you think it's worth adding plots like this on top of the OLS results table?:
I should mention that I have already prepared box plots representing the river and the wells, so I don't know if that won't be too many figures.
