I am working in biostatistics and often have the following conversation with medics: we are talking about some very interesting, but also very rare, disease/disorder (or side effect of a drug) until the point comes, where somebody says "the prevalence of this disease/side effect is only 5 per 1 Million people".
I always ask, how can they know the prevalence of such a rare disease - the confidence interval for an estimator using maximum likelihood must be huge? And a Bayesian approach is very likely to be biased, since clinicians will probably overestimate the relevance of their extremely rare disease. Unfortunately, so far nobody could provide me with an answer and this is really grinding my gears. Of course, it is obvious that for very dangerous and contagious diseases there are national databases, and hospitals are obliged to report any cases, so I am talking about genetic disorders in particular.
On the other hand, what you also see quite often, is that a disease or disorder is said to have an enormously high prevalence, which seems quite unlikely to me. For example, spina bifida is said to have a prevalence of 5% in newborns (Sandler 2010, doi:10.1016/j.pcl.2010.07.009). Since I and any other person I asked - besides medical doctors - had never heard of this disease before, I struggled (and still struggle) to believe that.
So, my question are:
- How do epidemiologists estimate the prevalence of a disease/disorder with a true prevalence of say 1 in 1,000,000 if there is no database containing all cases? (or are there countries with databases for every relevant disease/disorder?)
- What are common sizes of the corresponding confidence intervals? Are the estimators reliable at all?
- How can the prevalence of spina bifida be at 5% and still many people did not have ever heard about it, and how is this number determined?
Thanks a lot!