0

I have two sets of data, SetA and SetB - they are metrics of the same group from different time or paired. SetA score at 10 am and SetB score at 5 pm, same person (30 of them).

I want to check if the mean is significantly different between the two. The scores violate NORMALITY, hence, I have considered 3 tests,would love your feedback.

The tests are giving me wrong answers. My role is to see if two Sets are significant different.

First,


import scipy.stats as stats

Data

setA = [0.9995, 1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.9993, 0.9381, 0.6929, 0.7971, 0.8464, 0.0220, 0.9979, 0.8584, 0.7538, 0.8027, 0.8768, 0.0231, 0.9990, 0.8611, 0.6294, 0.7273, 0.8146, 0.0294, 0.9992, 0.8466, 0.7284, 0.7831, 0.8641, 0.0252]

setB = [0.9996, 0.9870, 0.7755, 0.8686, 0.8877, 0.0146, 0.9993, 0.9688, 0.6327, 0.7654, 0.8163, 0.0240, 0.9992, 0.8571, 0.6735, 0.7543, 0.8366, 0.0272, 0.9989, 0.7375, 0.6020, 0.6629, 0.8008, 0.0380, 0.9993, 0.8372, 0.7347, 0.7826, 0.8672, 0.0253]

Perform Wilcoxon signed-rank test

statistic, p_value = stats.wilcoxon(setA, setB)

Print results

print("Wilcoxon signed-rank test results:") print(f"Test statistic: {statistic}") print(f"P-value: {p_value}")

Check significance level (e.g., alpha = 0.05)

alpha = 0.05 if p_value < alpha: print("Reject the null hypothesis: There is a significant difference between the paired samples.") else: print("Fail to reject the null hypothesis: There is no significant difference between the paired samples.")

Data

Calculate the average mean for SetA

average_mean_setA = sum(setA) / len(setA)

Calculate the average mean for SetB

average_mean_setB = sum(setB) / len(setB)

print(f"Average Mean for SetA: {average_mean_setA}") print(f"Average Mean for SetB: {average_mean_setB}")''''

Wilcoxon signed-rank test results: Test statistic: 99.0 P-value: 0.01038770331258549 Reject the null hypothesis: There is a significant difference between the paired samples. Average Mean for SetA: 0.7305133333333331 Average Mean for SetB: 0.6991266666666668

Second Test - Bootstrap sampling

# Calculate the observed mean difference
observed_mean_difference = np.mean(np.array(setB) - np.array(setA))

Number of bootstrap samples

num_samples = 10000

Initialize an array to store bootstrapped mean differences

bootstrap_mean_differences = np.zeros(num_samples)

Perform bootstrap sampling

for i in range(num_samples): # Resample with replacement from the combined dataset combined_data = np.concatenate((setA, setB)) resampled_data = np.random.choice(combined_data, size=len(combined_data), replace=True)

# Calculate mean difference for this bootstrap sample
bootstrap_mean_difference = np.mean(resampled_data[:len(setA)]) - np.mean(resampled_data[len(setA):])
bootstrap_mean_differences[i] = bootstrap_mean_difference

Calculate the p-value

p_value = np.sum(bootstrap_mean_differences >= observed_mean_difference) / num_samples

print("Bootstrap hypothesis test results:") print(f"Observed Mean Difference: {observed_mean_difference}") print(f"P-value: {p_value}") Bootstrap hypothesis test results: Observed Mean Difference: -0.03138666666666667 P-value: 0.6493

Even if the SetB is 0s, it is failing to reject NULL (no significant difference).

Permutation Test

# Calculate the observed mean difference
observed_mean_difference = np.mean(setB) - np.mean(setA)

Number of permutation samples

num_permutations = 10000

Initialize an array to store permutation mean differences

permutation_mean_differences = np.zeros(num_permutations)

Perform permutation sampling

for i in range(num_permutations): # Combine the data combined_data = setA + setB

# Permute the combined data
permuted_data = np.random.permutation(combined_data)

# Calculate mean difference for this permutation sample
perm_setA = permuted_data[:len(setA)]
perm_setB = permuted_data[len(setA):]
permutation_mean_difference = np.mean(perm_setB) - np.mean(perm_setA)
permutation_mean_differences[i] = permutation_mean_difference

Calculate the p-value

p_value = np.sum(permutation_mean_differences >= observed_mean_difference) / num_permutations

print("Permutation hypothesis test results:") print(f"Observed Mean Difference: {observed_mean_difference}") print(f"P-value: {p_value}") Permutation hypothesis test results: Observed Mean Difference: -0.031386666666666674 P-value: 0.6376

Glen_b
  • 282,281
b t
  • 1
  • 1
    have you looked at repeated measures? – Estimate the estimators Aug 28 '23 at 04:11
  • 1
    (1) As @Estimatetheestimators writes, you should really be looking at repeated measures models, or paired t-tests. (2) The Wilcoxon and the t-test test very different things! (3) In your second model, you resample from the full dataset, instead of bootstrapping the difference in means (or bootstrapping the group means separately). That will not give you what you are presumably looking for, i.e., a confidence interval or a p-value for the difference in means. – Stephan Kolassa Aug 28 '23 at 06:24
  • "The tests are giving me wrong answers." How can you know that? – Christian Hennig Aug 28 '23 at 13:20
  • Note that the paired t-test assumes that the differences are normally distributed. How do you know that normality is violated? Note also that generally normality is always violated, and in many situations a t-test will work well anyway. It depends on how exactly normality is violated, i.e., whether anything goes on that will mislead the inference. Many deviations from normality are rather harmless. – Christian Hennig Aug 28 '23 at 13:23
  • @ChristianHennig - I ran Q-Q plot and Shapiro Test - Q-Q plot have very rare fit. When I populate second set with all 0s, the Bootstrap was still saying it was not significant while the avg mean diff was high. – b t Aug 29 '23 at 14:28
  • @StephanKolassa - for bootstrap, I am establishing observed mean dff (x) of two sets then iterating n times and storing more observed mean diff then (total mean diff above and greater) (y), to get p-value y/n, which is a probability metric I am testing against ..does this make sense? – b t Aug 29 '23 at 14:30
  • I'm not quite sure I understand your explanation, sorry... to bootstrap the difference in means, you would bootstrap the observations in both groups separately, then calculate the means of the bootstrapped data within each group, then take the difference, store this difference, and iterate the procedure many times. Finally you would compare the observed difference in means to the distribution of the bootstrapped differences. – Stephan Kolassa Aug 29 '23 at 14:46
  • @StephanKolassa - agreed, that is what I am doing with code. Not sure if there is a flaw, combined_data stores all boostrap sample then this calculates mean difference bootstrap_mean_differences[i] = bootstrap_mean_difference. After that it is proportion. – b t Aug 29 '23 at 17:12

0 Answers0