1

I'm comparing two different populations with unequal variance and non normal distributions.

For sample #1 I'm drawing a random sample size of $n=30$ from a population of 200. For sample #2 I'm drawing a random sample size of $n=30$ from a population of 840. Since the two sample sizes have unequal variance, I'm using Welche's t test (unequal variance test).

Is it a problem that my sample #1 population is only 200? Should I just use $n=20$ instead? I read that a random sample of $n=20$ should come from a population ten times that size.

I also read that normal distribution is irrelevant when the sample size is $n>20$ (but I also read this for $n>30$) so I'm a bit lost.

What's a good rule of thumb for population size when I'm randomly drawing samples of $n=30$? I'm doing all of this in python.

  • 2
    Welcome to CV. Ignore what you have read--the stuff you quote is, to put it mildly, hardly even good rules of thumb. What matters most is how skew the population distributions are. What might be of greater concern is whether you truly have finite populations: the vast majority of questions we receive about finite populations turn out to be based on misconceptions. To avoid the possibility of that error, could you give us a brief explanation of what these populations are and how you have sampled them? – whuber May 08 '19 at 22:00
  • 1
    Wow, thanks for your response whuber! Essentially, I joined 3 datasets (health code violations, yelp data and financial data). Inner joined each set and randomly had 906 restaurants match. So I have a population of 906 restaurants in Las Vegas. I parsed the reviews for the term "hole in the wall" (hitw) and divided 906 restaurants into two samples. hitw (201 samples) and non-hitw (705 samples). I'm using bootstrapping to randomly draw and replace 30 samples at a time and I'm taking the average p value that I've calculated. I am hypothesizing "are hitw restaurants really less clean." – Khalid Rahman May 08 '19 at 22:29
  • 2
    "I'm using bootstrapping to randomly draw and replace 30 samples at a time and I'm taking the average p value that I've calculated" --- you should post a question asking whether this is a good approach (and what you could do instead) – Glen_b May 09 '19 at 02:33
  • Hey thanks Glen! I'll post that instead. I think you're right. – Khalid Rahman May 09 '19 at 15:03
  • I found this population calculator (linked below) which appears to calculate population size. Very cool. https://select-statistics.co.uk/calculators/sample-size-calculator-population-proportion/ – Khalid Rahman May 09 '19 at 20:48
  • @Khalid Rahman: It doesn't (cannot) calculate pop size, that is part of its inputs ... – kjetil b halvorsen Feb 03 '20 at 12:50

0 Answers0