Compare survival between two unbalanced groups

Question

Briefly, 184 patients are included in my analysis. I have one variable that seperates 184 patients into two groups. 173 are in group 0 and 11 are in group 1. I need to compare the survival between these two groups. (This variable is something like gene mutation status, where 0 represents no mutation, 1 represents mutation)

Univariate Cox regression seems to be the choice. However, such unbalanced sample size between these two groups makes me worried about the power of test.

I didn't find much literature discussing this kind of situation. Two topics discussed this case first one second one. But my case is a little bit different from them.

My questions are:

Is it correct to directly use univariate cox regression to test survival difference between these two groups?
Downsampling might be a solution? But randomly sampling 11 objects from 173 patients results in millions possible combination. How can I achieve this in R?
caret package has the downsampling function, but it does not include cox model. After downsampling, it is impossible to perform k-fold cross-validation using train() function. Could anyone tell me how to create a downsampling Cox regression model?

Thank you.

score 0 · Accepted Answer · answered Mar 22 '22 at 13:05

You are correct to be worried about power, given that you have only 11 members of group 1. The power of a Cox regression is essentially determined by the number of events and you have at most 11 events in that group.

Throwing away members of group 0 by downsampling, however, throws away events in that group and thus loses power overall.* Would you do something like that with unequal group sizes for a t-test? That only hurts you. Use all your data.

You should, however, evaluate whether membership in group 1 is associated with known outcome-associated covariates. You also should consider incorporating standard clinical covariates, not just membership in your groups, into your survival model.

*I see that an answer in one of your linked threads made an argument that unbalanced group sizes hurts power. Perhaps that led in part to this question. That argument was based on a "retrospective power" analysis that does no good after the data are in. As Russ Lenth puts it: "You've got the data, did the analysis, and did not achieve 'significance.' So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn't powerful enough -- that's why the result isn't significant. Power calculations are useful for design, not analysis." There might be advantages to having balance between groups in study design, but there is no advantage to throwing away already accumulated data to achieve balance.

For instance, A gene mutation status is as a covariate in Cox model. By incorportating a known covariate e.g. tumor stage (Surv(month,status)~stage+A. ), Cox model found significance in A and stage. But it didn't tell the significant difference between A0 and A1 groups is resulted from covariate stage (like more low stage patients have A0). How to evaluate it? By testing their interaction? — tomasz, Mar 23 '22 at 14:49
Additionally, I used R package powerSurvEpi to calculate the power of Cox regression based on exprected number of events and HR (as mentioned in function powerCT.default0 ). But later I found it is retrospective power as you mentioned "for design not analysis". I don't know if it is meaningless to use a power larger than 75% as threshold to filter results. — tomasz, Mar 23 '22 at 14:49
@tomasz don't base anything on "retrospective power" analysis. If both stage and A are significant in your 2-predictor model, then there's evidence that each has an association with outcome that goes beyond things like "more low stage patients have A0." The model adjusts for that type of thing, at least in part. (You still should report whether there is an association between A and stage values.) An interaction term between A and stage in the model would further evaluate whether the association of stage with outcome depends onA (and vice-versa). — EdM, Mar 23 '22 at 16:17

Compare survival between two unbalanced groups

1 Answers1