0

I am new in stratified sampling. I have a regression task to analyse the relationship between income, household size and expenditure, using python. Both income and expenditure are continuous variables, while household size is categorical. I have following questions:

  1. I asked chatGPT, it said if it is regression task, it is better to consider 3 variables as stratification targets. I think it is too complex, should I consider 3 variables? (If so, two variables should be enough, since income and expenditure is highly correlated, am I right?)

  2. Is converting the variables into categorical variable necessary? If so, I will like to create a new variable called expenditure levels to classify households into different groups, then start stratified sampling.

Lu Cas
  • 11
  • 1
  • Welcome to Cross Validated! Do you have this data already available? If yes, why do you need to sample? – jmarkov Feb 23 '23 at 07:13
  • 2
    Seriously, from which aspect do you consider ChatGPT as a stat guru? – User1865345 Feb 23 '23 at 07:15
  • I have data available. I want to do stratified sampling because I am not going to use whole dataset, I jwant to divide the dataset samples into different strata based on expenditure, then randomly picked them for regression task. – Lu Cas Feb 23 '23 at 07:23
  • @User1865345, I just try only, and I personally also not really trust AI answers, so I came here for helps. Is ChatGPT bad? – Lu Cas Feb 23 '23 at 07:28
  • 1
    I won't consider ChatGPT to be a reliable consultant. Period. You can also check the meta discussion here to get a sense of how deceptive it can be. – User1865345 Feb 23 '23 at 07:33

0 Answers0