I have two groups of samples, NGHC (n=14) and NHC (n=87). The result of the samples CO.05 should be 0 or 1. For example, the results can be
0 1
NGHC 11 3
NHC 87 0
This is a subset of my data frame,
df <- structure(list(CLASS = c("NGHC", "NGHC", "NGHC", "NGHC", "NGHC",
"NGHC", "NGHC", "NGHC", "NGHC", "NGHC", "NGHC", "NGHC", "NGHC",
"NGHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC",
"NHC", "NHC", "NHC", "NHC", "NHC", "NHC", "NHC"), CO.05 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-101L))
The cross-tabulation of the df
table(df$CLASS, df$CO.05)
0
NGHC 14
NHC 87
when I try to calculate the chi-square of this data frame,
summary(table(df$CLASS, df$CO.05))
it returns
Number of cases in table: 101
Number of factors: 2
Test for independence of all factors:
Chisq = 2.2539e-31, df = 0, p-value = 0
It is should be a table of 2x2. Shouldn't the p-value = 1?
(Can I ask an additional question here? If not I will delete this Question: Since the samples of these two groups are imbalanced (14 vs 87), is chi-square the correct statistics method to compare the significance between these two groups?)
set.seed(2); df2<-data.frame(class=sample(c("NGHC","NHC"),100,T),CO.05=sample(0:1,100,T))You can dotable(df2)andsummary(table(df2))as well aschisq.test(table(df2), correct = FALSE)with it. – Bernhard Jun 30 '22 at 14:07?chisq.testand read the documentation. At "Details" it states, "If x is a matrix with one row or column, or if x is a vector and y is not given, then a goodness-of-fit test is performed (x is treated as a one-dimensional contingency table). The entries of x must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given." – whuber Jun 30 '22 at 15:25