Yesterday I was helping a friend in reviewing a clinical psychometric paper for his Bachelor's thesis. There, there is this table presented:
Now, the paper states that (the emphasis is mine):
"Data analysis
Data are presented as means with 95% confidence intervals or proportions (percentage) and rates (percentage per year). Preliminary tests of association used standard bivariate comparisons based on analysis of variance methods (t-score) for continuous data and contingency tables (χ2) for categorical data. We evaluated relative risk ratios for prevalence or magnitude of each tested factor for association with suicidal status, overall and for BD versus MDD. The resulting P-values were not adjusted for multiple comparisons to guide the selection of factors for multivariable logistic modelling which generated odds ratios and their confidence intervals and χ2 values. Factors supported by logistic regression modelling were included in receiver operating characteristic (ROC) analyses of Bayesian sensitivity (true positive rate) versus (100 − specificity) (false positive rate) to compute the area under the curve (AUC) as a percentage.
Results Sample and exposures times
The study sample included 3284 patients with one of the following DSM-5 major mood disorders: BD-I (n = 714), BD-II (n = 497), all BD (n = 1211) or unipolar MDD (n = 2073). The duration of illness averaged 17.0 years with BD and 11.7 years with MDD (Table 1); 64.4% were women. Participants were followed prospectively at the study site for an overall average of 2.95 (95% CI 2.74–3.16) years.
Suicidal risks
The lifetime risk (percentage of patients) of identified suicidal ideation was significantly greater among patients with BD-II (35.0%) than those with BD-I (25.2%), and among patients with BD (29.2%) than those with MDD (17.3%), as were the respective annualised rates (1.92 v. 1.55%/year and 1.72 v. 1.47%/year; Table 1). "
Moreover the Table caption says: " Lifetime prevalence in patients with bipolar disorder (BD, type I or II) or major depressive disorder (MDD) compared as relative risk ratios (with χ2). Exposure years are compared by analysis of variance (t-score) and exposure-adjusted rates (percentage/year) are compared as incidence rate ratios (IRRs, with exact P-values). BD-I, BD type I; BD-II, BD type II.
$^a$ Prevalence of violent acts is expressed as proportion of all suicidal acts (attempts + suicides)."
Now, we have troubles in replicating the p-values calculations shown in the table. For instance, for the first row, we tried building the contingency table and performing a χ2 test, but that gave us an extremely low p-value (<0.001), and not 0.007 as shown. Same when doing the t-test for comparing two percentages. Could you tell what are the tests performed to populate the table?
Also, as a side note, we couldn't tell
- why are the percentages expressed with a CI. If they are calculated from the sample, and they express the ratio of a binary variable (has suffered/hasn't suffered), they are exact values, aren't they?
- what exactly is the meaning of "Exposure (years)"? For what we could interpret, it is the total time the patient has been exposed to that particular symptom, that is, for how many years they have suffered from it (as the text suggests). However, they are related to the incidence values of the lower half of the table ($Prevalence (\%) = Rates (\%/year) \cdot Exposure (years)$), suggesting it is the cumulative amount of time the study has been conducted. What are we missing?
Thanks!

.252*714yields about180, next.35*497yields about174, finallychisq.test(rbind(c(180,714-180),c(174,497-174)))calculates a chi-squared test on the contingency table and givesp-value = 0.0002899, which I rounded. – Stephan Kolassa Sep 18 '23 at 15:20