Most Popular

1500 questions
36
votes
1 answer

"Frequency" value for seconds/minutes intervals data in R

I'm using R(3.1.1), and ARIMA models for forecasting. I would like to know what should be the "frequency" parameter, which is assigned in the ts() function, if im using time series data which is: separated by minutes and is spread over 180 days…
Apython
  • 655
36
votes
3 answers

Generating data with a given sample covariance matrix

Given a covariance matrix $\boldsymbol \Sigma_s$, how to generate data such that it would have the sample covariance matrix $\hat{\boldsymbol \Sigma} = \boldsymbol \Sigma_s$? More generally: we are often interested in generating data from a density…
Kees Mulder
  • 1,674
36
votes
6 answers

How can I analytically prove that randomly dividing an amount results in an exponential distribution (of e.g. income and wealth)?

In this current article in SCIENCE the following is being proposed: Suppose you randomly divide 500 million in income among 10,000 people. There's only one way to give everyone an equal, 50,000 share. So if you're doling out earnings randomly,…
vonjd
  • 6,146
36
votes
8 answers

In Naive Bayes, why bother with Laplace smoothing when we have unknown words in the test set?

I was reading over Naive Bayes Classification today. I read, under the heading of Parameter Estimation with add 1 smoothing: Let $c$ refer to a class (such as Positive or Negative), and let $w$ refer to a token or word. The maximum likelihood…
36
votes
1 answer

Relation between variational Bayes and EM

I read somewhere that Variational Bayes method is a generalization of the EM algorithm. Indeed, the iterative parts of the algorithms are very similar. In order to test whether the EM algorithm is a special version of the Variational Bayes, I tried…
36
votes
3 answers

Asymptotic distribution of sample variance of non-normal sample

This is a more general treatment of the issue posed by this question. After deriving the asymptotic distribution of the sample variance, we can apply the Delta method to arrive at the corresponding distribution for the standard deviation. Let a…
35
votes
6 answers

Difference between Bayes network, neural network, decision tree and Petri nets

What is the difference between neural network, Bayesian network, decision tree and Petri nets, even though they are all graphical models and visually depict cause-effect relationship.
Ria George
  • 1,465
  • 2
  • 17
  • 31
35
votes
7 answers

Birthday paradox with a (huge) twist: Probability of sharing exact same date of birth with partner?

I share the same birthdate as my boyfriend, same date but also same year, our births are seperated by merely 5 hours or so. I know that the chances of meeting someone who was born on the same date than me is fairly high and I know a few people with…
curious
  • 517
35
votes
4 answers

Intuitive reasoning behind biased maximum likelihood estimators

I have a confusion on biased maximum likelihood (ML) estimators. The mathematics of the whole concept is pretty clear to me but I cannot figure out the intuitive reasoning behind it. Given a certain dataset which has samples from a distribution,…
ssah
  • 451
35
votes
4 answers

Can ANOVA be significant when none of the pairwise t-tests is?

Is it possible for one-way (with $N>2$ groups, or "levels") ANOVA to report a significant difference when none of the $N(N-1)/2$ pairwise t-tests does? In this answer @whuber wrote: It is well known that a global ANOVA F test can detect a…
amoeba
  • 104,745
35
votes
6 answers

Algorithm to dynamically monitor quantiles

I want to estimate the quantile of some data. The data are so huge that they can not be accommodated in the memory. And data are not static, new data keep coming. Does anyone know any algorithm to monitor the quantiles of the data observed so far…
35
votes
4 answers

Why are Jeffreys priors considered noninformative?

Consider a Jeffreys prior where $p(\theta) \propto \sqrt{|i(\theta)|}$, where $i$ is the Fisher information. I keep seeing this prior being mentioned as a uninformative prior, but I never saw an argument why it is uninformative. After all, it is not…
bayesian
  • 869
35
votes
4 answers

Why not report the mean of a bootstrap distribution?

When one bootstraps a parameter to get the standard error we get a distribution of the parameter. Why don't we use the mean of that distribution as a result or estimate for the parameter we are trying to get? Shouldn't the distribution approximate…
35
votes
4 answers

Generating random variables from a mixture of Normal distributions

How can I sample from a mixture distribution, and in particular a mixture of Normal distributions in R? For example, if I wanted to sample from: $$ 0.3\!\times\mathcal{N}(0,1)\; + \;0.5\!\times\mathcal{N}(10,1)\; +…
user30490
35
votes
5 answers

Data "exploration" vs data "snooping"/"torturing"?

Many times I have come across informal warnings against "data snooping" (here's one amusing example), and I think I have an intuitive idea of roughly what that means, and why it may be a problem. On the other hand, "exploratory data analysis" seems…
kjo
  • 1,967