I have a dataset of approximately 10,000 patients for whom I investigate the association between a specific measurement and disease risk. For the independent variable, I use restricted cubic splines - but I am somewhat uncertain about the appropriate number of knots to use. The literature I found suggests that for large sample sizes (such as my dataset), n=5 would be appropriate - however, I am not convinced by the results (same data analysed with 3, 4 and 5 knots):
Intuitively, I would select 3 knots as there is no obvious advantage in higher numbers - but is this really the case?
](../../images/c81fb8b9c82ef886d89e7cbb42533de5.webp)