0

I am running a test with two independent variables (Interfaces and Tasks) and one dependent variable (task completion rate):

The data looks like this

+-------------+-------------+-------------+
|                 Task 1                  |
+-------------+-------------+-------------+
| Participant | Interface 1 | Interface 2 |
+-------------+-------------+-------------+
|           1 | Success     | Fail        |
|           2 | Fail        | Success     |
|           3 | Success     | Fail        |
+-------------+-------------+-------------+

So after the first task, the completion rate (CP rate) for Interface 1 is 2/3 and 1/3 for interface 2.

I have ten different tasks (the same participant is required to run through all of the tasks with different interfaces):

+------+-------------------------+------------------------+
| Task | CP rate for Interface 1 | CP rate forInterface 2 |
+------+-------------------------+------------------------+
|    1 |                     0.6 |                    0.4 |
|    2 |                     0.3 |                    0.5 |
|    3 |                     0.6 |                    0.2 |
|    4 |                     0.2 |                    0.5 |
|    5 |                     0.1 |                    0.8 |
|    6 |                     0.6 |                    0.4 |
|    7 |                     0.3 |                    0.5 |
|    8 |                     0.6 |                    0.2 |
|    9 |                     0.2 |                    0.5 |
|   10 |                     0.1 |                    0.8 |
+------+-------------------------+------------------------+

In the end, I want to know if the CP rate is different between Interface 1 & 2.

My question is: is it appropriate to use the independent T-Test in this case? (Assuming the variance is equal for both interfaces)

Cheng
  • 243
  • Why not use a generalised mixed model where the participant and task are treated as random effects and the interfaces acts as a fixed effect? Using completion rates aggregates information unnecessarily. – usεr11852 Mar 11 '19 at 02:16
  • @usεr11852 Thanks for pointing that out, I am not familiar with GLMM, any recommendations on what kind of introductory material that I can read? – Cheng Mar 11 '19 at 03:09
  • https://stats.stackexchange.com/questions/189115 and https://stats.stackexchange.com/questions/233366 should be good places to start. Also note that the proportions data in the table shown are paired. (If you choose to go with that) – usεr11852 Mar 11 '19 at 12:02
  • @usεr11852 Your idea may work, but I feel like it is overcomplicating things to get no real benefit – David Mar 14 '19 at 10:25

1 Answers1

1

You can perform independent Z-test ($\frac{\hat{X}-\mu}{\sigma}$ is almost normal in this case given a decent sample) for each of the tasks, and the variance will be p*(1-p)/n where p is the "success rate" and n is sample size. However, beware that the probability of a false positive increases as 10 tests will be performed since $0.95^{10} ~ 0.6$ To keep a 5% false positive chance, use a p-value of 0.5% instead ($0.995^{10} ~ 0.951$)

The idea of building a model with your data is interesting, but it may be overcomplicating things to no real avail.

David
  • 2,596
  • Do you mean to perform the z-test on each task? or on each interface? – Cheng Mar 12 '19 at 05:05
  • According to your explanation, I think the right approach would be a Z-test comparing both interfaces for each of the tasks, thus meaning ten comparisons – David Mar 12 '19 at 07:23
  • I see, two questions: 1.Why not use t-test? 2. is there a way to "compress" the ten tasks, so that I can compare interface 1 vs 2? – Cheng Mar 14 '19 at 09:44
  • @Cheng T-test are used for mean comparison in populations that are assumed to follow a normal distribution. Here we are checking a proportion (value between 0 and 1) Fortunatelly, for big enough samples, they both provide basically the same results, so proceed as you may prefer. – David Mar 14 '19 at 10:22
  • @Cheng Yes, you can work with an average (or any other metric you see fit) of the performance on the ten task and compare the results, but, since we are adjusting the p-value limit, I would be happy enough with looking for significant differences on each individual task – David Mar 14 '19 at 10:24