Most Popular

1500 questions
55
votes
1 answer

Alternatives to one-way ANOVA for heteroskedastic data

I have data from 3 groups of algae biomass ($A$, $B$, $C$) which contain unequal sample sizes ($n_A=15$, $n_B=13$, $n_C=12$) and I would like compare if these groups are from the same population. One-way ANOVA would definitely be the way to go,…
Rick L.
  • 551
55
votes
3 answers

Online vs offline learning?

What is the difference between offline and online learning? Is it just a matter of learning over the entire dataset (offline) vs. learning incrementally (one instance at a time)? What are examples of algorithms used in both?
griffin
  • 905
55
votes
4 answers

Does the sign of scores or of loadings in PCA or FA have a meaning? May I reverse the sign?

I performed principal component analysis (PCA) with R using two different functions (prcomp and princomp) and observed that the PCA scores differed in sign. How can it be? Consider this: set.seed(999) prcomp(data.frame(1:10,rnorm(10)))$x …
user1320502
  • 1,007
55
votes
5 answers

Correct spelling (capitalization, italicization, hyphenation) of "p-value"?

I realize this is pedantic and trite, but as a researcher in a field outside of statistics, with limited formal education in statistics, I always wonder if I'm writing "p-value" correctly. Specifically: Is the "p" supposed to be capitalized? Is the…
gotgenes
  • 943
  • 2
  • 8
  • 9
55
votes
5 answers

Using deep learning for time series prediction

I'm new in area of deep learning and for me first step was to read interesting articles from deeplearning.net site. In papers about deep learning, Hinton and others mostly talk about applying it to image problems. Can someone try to answer me can it…
Vedran
  • 651
  • 1
  • 6
  • 4
55
votes
8 answers

Why is Entropy maximised when the probability distribution is uniform?

I know that entropy is the measure of randomness of a process/variable and it can be defined as follows. for a random variable $X \in$ set $A$ :- $H(X)= \sum_{x_i \in A} -p(x_i) \log (p(x_i)) $. In the book on Entropy and Information Theory by…
user76170
  • 789
55
votes
4 answers

Difference between forecast and prediction?

I was wondering what difference and relation are between forecast and prediction? Especially in time series and regression? For example, am I correct that: In time series, forecasting seems to mean to estimate a future values given past values of a…
Tim
  • 19,445
55
votes
3 answers

How can I calculate $\int^{\infty}_{-\infty}\Phi\left(\frac{w-a}{b}\right)\phi(w)\,\mathrm dw$

Suppose $\phi(\cdot)$ and $\Phi(\cdot)$ are density function and distribution function of the standard normal distribution. How can one calculate the integral: $$\int^{\infty}_{-\infty}\Phi\left(\frac{w-a}{b}\right)\phi(w)\,\mathrm dw$$
hadisanji
  • 885
55
votes
4 answers

What is difference-in-differences?

Difference in differences has long been popular as a non-experimental tool, especially in economics. Can somebody please provide a clear and non-technical answer to the following questions about difference-in-differences. What is a…
55
votes
4 answers

How do we decide when a small sample is statistically significant or not?

Sorry if the title isn't clear, I'm not a statistician, and am not sure how to phrase this. I was looking at the global coronavirus statistics on worldometers, and sorted the table by cases per million population to get an idea of how different…
55
votes
4 answers

Fast linear regression robust to outliers

I am dealing with linear data with outliers, some of which are at more the 5 standard deviations away from the estimated regression line. I'm looking for a linear regression technique that reduces the influence of these points. So far what I did is…
Matteo Fasiolo
  • 3,254
  • 2
  • 23
  • 29
55
votes
3 answers

What is a latent space?

In the context of machine learning, I often hear the term latent space, sometimes qualified with the word "high dimensional" or "low dimensional" latent space. I am a bit puzzled by this term (as it is almost never defined rigorously). Can someone…
Fraïssé
  • 1,540
55
votes
8 answers

Is sampling relevant in the time of 'big data'?

Or more so "will it be"? Big Data makes statistics and relevant knowledge all the more important but seems to underplay Sampling Theory. I've seen this hype around 'Big Data' and can't help wonder that "why" would I want to analyze everything?…
PhD
  • 14,627
55
votes
2 answers

What is it meant with the $\sigma$-algebra generated by a random variable?

Often, in the course of my (self-)study of statistics, I've met the terminology "$\sigma$-algebra generated by a random variable". I don't understand the definition on Wikipedia, but most importantly I don't get the intuition behind it. Why/when do…
DeltaIV
  • 17,954
55
votes
5 answers

What is difference between “in-sample” and “out-of-sample” forecasts?

I don't understand what exactly is the difference between "in-sample" and "out of sample" prediction? An in-sample forecast utilizes a subset of the available data to forecast values outside of the estimation period. An out of sample forecast…