I am new in stratified sampling. I have a regression task to analyse the relationship between income, household size and expenditure, using python. Both income and expenditure are continuous variables, while household size is categorical. I have following questions:
I asked chatGPT, it said if it is regression task, it is better to consider 3 variables as stratification targets. I think it is too complex, should I consider 3 variables? (If so, two variables should be enough, since income and expenditure is highly correlated, am I right?)
Is converting the variables into categorical variable necessary? If so, I will like to create a new variable called expenditure levels to classify households into different groups, then start stratified sampling.