0

I was working through Harvard data science course: I came across this question:

Assume you have $M$ polls with sample sizes $n_1,\ldots,n_M$. If the polls are independent, what is the average of the variances of each poll if the true proportion is $p$?

enter image description here

I know the general definition of variance, deviation from the mean square. But in this case how are they defining variance of a single poll as $p(1-p)\over{n}$?

  • 1
    This is explained in some detail by several of the answers to Why do political polls have such large sample sizes? – Silverfish Jan 31 '15 at 02:01
  • @Silverfish Got it!!! – Elizabeth Susan Joseph Jan 31 '15 at 03:25
  • @Silverfish were can I read more about statistical concepts need to predict election polls. Your post is exactly what I was looking for. I want to strengthen my statistical concepts further. – Elizabeth Susan Joseph Jan 31 '15 at 05:00
  • @ElizabethSusanJoseph, I'd start by getting a good foundation on sample survey methods. The books by Wiliiam Cochran (1971) and Leslie Kish will be instructive, but are heavy on the mathematics of the subject if that's what you are looking for. – StatsStudent Jan 31 '15 at 05:44
  • @StatsStudent I am looking for statistical foundations I need to become a data scientist. I am good at programming, but want to understand statistics for performing data analysis. – Elizabeth Susan Joseph Jan 31 '15 at 07:25
  • 1
    @ElizabethSusanJoseph that's a very broad topic. I've found that most "data scientists" don't really do any science, so to speak, but mostly work with the acquisition and manipulation of data through electronic means. Statisticians do the science part mostly. Getting a nice blend of both will take you far for sure. It sounds like you might have the programming part pretty well covered. Do you have much of a mathematics background? If not, I suggest starting out with the basics of learning Calculus and Linear Algebra. From there you'd be in a position to learn about statistics. – StatsStudent Jan 31 '15 at 07:47
  • (Continued) Then, you may want to learn about the analysis of massive datasets. See for example: http://www.mmds.org/ – StatsStudent Jan 31 '15 at 07:56
  • @StatsStudent - Hi Thanks for this great advice. I have enafe background in mathematics. I am not an expert in linear algebra. I have completed prof ng machine learning course. and I have also read statistics in plain english book. This book gave me a good understanding of statistical concepts. I also took harvard data science course. But I want to learn further about machine learning and statistics. how is machine learning and statistics related?? – Elizabeth Susan Joseph Feb 01 '15 at 04:00
  • It is simply explained part of the expected variance i.e. the value you obtain at the end is an estimate of average variability across samples. Thus, you reach at true estimate. –  Feb 03 '15 at 15:30

0 Answers0