I have two sets of data, SetA and SetB - they are metrics of the same group from different time or paired. SetA score at 10 am and SetB score at 5 pm, same person (30 of them).
I want to check if the mean is significantly different between the two. The scores violate NORMALITY, hence, I have considered 3 tests,would love your feedback.
The tests are giving me wrong answers. My role is to see if two Sets are significant different.
First,
import scipy.stats as stats
Data
setA = [0.9995, 1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.9993, 0.9381, 0.6929, 0.7971,
0.8464, 0.0220, 0.9979, 0.8584, 0.7538, 0.8027, 0.8768, 0.0231, 0.9990, 0.8611,
0.6294, 0.7273, 0.8146, 0.0294, 0.9992, 0.8466, 0.7284, 0.7831, 0.8641, 0.0252]
setB = [0.9996, 0.9870, 0.7755, 0.8686, 0.8877, 0.0146, 0.9993, 0.9688, 0.6327, 0.7654,
0.8163, 0.0240, 0.9992, 0.8571, 0.6735, 0.7543, 0.8366, 0.0272, 0.9989, 0.7375,
0.6020, 0.6629, 0.8008, 0.0380, 0.9993, 0.8372, 0.7347, 0.7826, 0.8672, 0.0253]
Perform Wilcoxon signed-rank test
statistic, p_value = stats.wilcoxon(setA, setB)
Print results
print("Wilcoxon signed-rank test results:")
print(f"Test statistic: {statistic}")
print(f"P-value: {p_value}")
Check significance level (e.g., alpha = 0.05)
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference between the paired samples.")
else:
print("Fail to reject the null hypothesis: There is no significant difference between the paired samples.")
Data
Calculate the average mean for SetA
average_mean_setA = sum(setA) / len(setA)
Calculate the average mean for SetB
average_mean_setB = sum(setB) / len(setB)
print(f"Average Mean for SetA: {average_mean_setA}")
print(f"Average Mean for SetB: {average_mean_setB}")''''
Wilcoxon signed-rank test results:
Test statistic: 99.0
P-value: 0.01038770331258549
Reject the null hypothesis: There is a significant difference between the paired samples.
Average Mean for SetA: 0.7305133333333331
Average Mean for SetB: 0.6991266666666668
Second Test - Bootstrap sampling
# Calculate the observed mean difference
observed_mean_difference = np.mean(np.array(setB) - np.array(setA))
Number of bootstrap samples
num_samples = 10000
Initialize an array to store bootstrapped mean differences
bootstrap_mean_differences = np.zeros(num_samples)
Perform bootstrap sampling
for i in range(num_samples):
# Resample with replacement from the combined dataset
combined_data = np.concatenate((setA, setB))
resampled_data = np.random.choice(combined_data, size=len(combined_data), replace=True)
# Calculate mean difference for this bootstrap sample
bootstrap_mean_difference = np.mean(resampled_data[:len(setA)]) - np.mean(resampled_data[len(setA):])
bootstrap_mean_differences[i] = bootstrap_mean_difference
Calculate the p-value
p_value = np.sum(bootstrap_mean_differences >= observed_mean_difference) / num_samples
print("Bootstrap hypothesis test results:")
print(f"Observed Mean Difference: {observed_mean_difference}")
print(f"P-value: {p_value}")
Bootstrap hypothesis test results:
Observed Mean Difference: -0.03138666666666667
P-value: 0.6493
Even if the SetB is 0s, it is failing to reject NULL (no significant difference).
Permutation Test
# Calculate the observed mean difference
observed_mean_difference = np.mean(setB) - np.mean(setA)
Number of permutation samples
num_permutations = 10000
Initialize an array to store permutation mean differences
permutation_mean_differences = np.zeros(num_permutations)
Perform permutation sampling
for i in range(num_permutations):
# Combine the data
combined_data = setA + setB
# Permute the combined data
permuted_data = np.random.permutation(combined_data)
# Calculate mean difference for this permutation sample
perm_setA = permuted_data[:len(setA)]
perm_setB = permuted_data[len(setA):]
permutation_mean_difference = np.mean(perm_setB) - np.mean(perm_setA)
permutation_mean_differences[i] = permutation_mean_difference
Calculate the p-value
p_value = np.sum(permutation_mean_differences >= observed_mean_difference) / num_permutations
print("Permutation hypothesis test results:")
print(f"Observed Mean Difference: {observed_mean_difference}")
print(f"P-value: {p_value}")
Permutation hypothesis test results:
Observed Mean Difference: -0.031386666666666674
P-value: 0.6376