1

I am measuring (the same) property of of 20 different objects. To measure this property I have two methods at my disposal. Which statistical test should I use to determine whether these two methods give consistent results?

  • 1
    What "consistent" means depends on the meaning of the measurements and the subject matter background. A difference of 5% (assuming you want the same numbers, which isn't necessarily the case in such problems) may be far too much in one situation and just fine in another. – Christian Hennig Sep 28 '20 at 21:38
  • The revision still gives no clue what you mean by 'consistent'. How much different can the new method be? Can the new method be more variable? Not clear what you're measuring and why. – BruceET Sep 28 '20 at 23:15
  • I think the motivation of the question is good, but it could use some clarification. A "test" cannot have any objective meaning until you explain what you mean by "consistent." However, if you just want to see how consistent the results are, why not draw their scatterplot? You can then quantify that in various ways using correlation and regression methods. – whuber Sep 28 '20 at 23:23

1 Answers1

1

In measuring blood samples for hemoglobin content an assay for hemoglobin (Hgb in g/dl) and the percent of red cells per volume (hematocrit, Hct in %) are considered equivalent for many clinical purposes. Different units, different variances, and one is numerically very nearly 1/3 of the other (typical Hct might be 15%, typical Hgb might be 45g/dl). A regression approach shows points tightly clustered around a straight line (correlation nearly $1),$ so that both measures are considered reliable--maybe what you're calling 'consistent'.

By contrast if the two methods of measuring something give normally distributed results, with the same units and the same variance, then a paired t test would suffice to show they're equivalent.

Addendum, per edit and comment:

Here are two scenarios, in which paired t tests show no difference between the two methods using $n = 20$ samples.

Scenario 1:

set.seed(928)
x1 = rnorm(20, 50, 2);  x2 = x1 + rnorm(20, 0, 5)
t.test(x1, x2, pair=T)
    Paired t-test

data: x1 and x2 t = 1.6775, df = 19, p-value = 0.1098 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.4867153 4.4164390 sample estimates: mean of the differences 1.964862

So a t test shows no difference between the two methods. However, the differences for the 20 items are as summarized below. Method differ on average by about 2 units with a standard deviation of about 5 units, and correlation is low (about 0.44).

summary(x1-x2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -6.115  -3.651   2.633   1.965   5.864  10.688 
sd(x1-x2)
[1] 5.238252

cor(x1,x2) [1] 0.4397941

Scenario 2:

y1 = rnorm(20, 50, 2);  y2 = y1 + rnorm(20, 0, .1)
t.test(y1, y2, pair=T)
    Paired t-test

data: y1 and y2 t = -0.05817, df = 19, p-value = 0.9542 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.05053764 0.04780447 sample estimates: mean of the differences -0.001366584

Again no significant difference, but much better agreement: Average differences are about 0.0014 with standard deviation 0.11, and a high correlation about 0.998.

summary(y1-y2)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.232753 -0.063022 -0.016312 -0.001367  0.061754  0.226550 
sd(y1-y2)
[1] 0.1050631

cor(y1, y2) [1] 0.9982463

Here are scatterplots for the two scenarios:

par(mfrow=c(1,2))
 plot(x1,x2, pch=20)
 plot(y1,y2, pch=20)
par(mfrow=c(1,1))

enter image description here

So again, you need to decide what you mean by 'consistent': no statistically significant difference observed among 20 items? Small standard deviation of differences for individual items? Reliable small differences below some amount of practical importance? Good correlation between the two measurements?

Maybe you want to test whether the mean if the differences $d_i$ is smaller than some crucial amount. What are the methods measuring and for what purpose. Are you trying to see if a manufacturing process is out of control? Are you trying to give customers a guarantee of a specific measurement with a small margin of error?

By whatever meaning of 'consistent', and without knowing your objectives, I would certainly prefer the new (2nd) method of Scenario 2.

BruceET
  • 56,185
  • I've edited the question to point out more clearly that I am measuring the same property of those 20 objects. Let's assume it is length. Therefore the units are the same. My main question is how to show if the two methods give the same result. – Steve OB Sep 28 '20 at 22:11
  • 1
    The paired t-test cannot show they are equivalent. It tests whether the mean difference is zero. This can be the case even if the two measurements in most cases are nowhere near to each other. The thread opener will need to specify some kind of tolerance for this; also in your example people need to decide whether the points are close enough to the line. The data cannot tell you how large a correlation is large enough, and neither can any statistical test. – Christian Hennig Sep 28 '20 at 22:34
  • @Lewian. Agreed (+1). (But a t test might show methods are not equivalent.) Agree especially that OP needs to define 'consistency'. Points I made in the discussion at the end of my Answer. But OP may need to see a couple of methods compared in order to understand issues in defining what 'consistency' actually means. // If the measurement is used to decide which items are 'junk', then maybe an entirely new method that does not correlate with the current one would be more effective. – BruceET Sep 28 '20 at 23:20