0

I'm trying to determine whether an unweighted or weighted regression would be more suitable for my data.

I have variables X and Y, both are measured variables but X has very small errors while Y has quite large errors. This is because Y is calculated from an average of 5 measurements - so the error bars are including repeatability of these values from our analytical instrument.

I initially thought weighted regression would work because it would treat data points with smaller errors as more important, giving more weight to Y values with smaller standard deviations (so that data points that were more reproducible by the instrument are more reliable).

However I've been warned against using weighted linear regressions because they are ideal for data that has large errors on both X and Y variables, is this statement true?

I'm also worried about choosing the regression method because I get very different p values from the two regressions. With the weighted regression my p value is <0.001, but with the unweighted regression my p value is ~0.5, which completely changes my interpretation. I'm trying to understand what's causing this much difference in p values with the different regression methods and what regression would be best for my data.

Any insight would be appreciated!

StasK
  • 31,547
  • 2
  • 92
  • 179
Jen
  • 143
  • I would not consider your Y variable with larger error, because you repeated the measurement 5 times. – Sextus Empiricus Mar 29 '18 at 19:44
  • The weighted regression might perform 'better' (lower significance). However, I imagine that this fit is dominated by a few 'accurate' high weight points (in only a part of the entire domain of X). So the result should not be generalized over the entire domain of your measurement for X. This would be sort of extrapolation. – Sextus Empiricus Mar 29 '18 at 19:48
  • @MartijnWeterings Initially the idea of repeating measurement was to test how reproducible my data was. That's why I'm wondering, if low standard deviations across 5 measurements of a same sample result in more 'accurate' data points, would it be acceptable to treat them with more 'weight'. What did you mean by "generalized over the entire domain"? Sorry, I'm not super familiar with regression methods. – Jen Mar 29 '18 at 20:16
  • You could indeed use the repeated measurements to find out the measurement error in the Y variable and how it varies for different values of Y, but you compare it with X. You say "...X has very small errors while Y has quite large errors. This is because* Y is calculated from an average of 5 measurements ...."*
  • – Sextus Empiricus Mar 29 '18 at 21:28
  • In some cases you might consider the lower weight of points as 'ignoring the points' like when you remove outliers (but less severe). If the lower weight points are all at the same side, mostly for very high or very low values, then you should regard your regression line as not being very representative for that region, even when you got some points measured there and included (they are however only low weight).
  • – Sextus Empiricus Mar 29 '18 at 21:35