Most Popular

1500 questions
50
votes
4 answers

Taking the expectation of Taylor series (especially the remainder)

My question concerns trying to justify a widely-used method, namely taking the expected value of Taylor Series. Assume we have a random variable $X$ with positive mean $\mu$ and variance $\sigma^2$. Additionally, we have a function, say,…
agronskiy
  • 685
  • 1
  • 6
  • 7
50
votes
5 answers

Normality of dependent variable = normality of residuals?

This issue seems to rear its ugly head all the time, and I'm trying to decapitate it for my own understanding of statistics (and sanity!). The assumptions of general linear models (t-test, ANOVA, regression etc.) include the "assumption of…
DeanP
  • 871
50
votes
3 answers

Intuitive difference between hidden Markov models and conditional random fields

I understand that HMMs (Hidden Markov Models) are generative models, and CRF are discriminative models. I also understand how CRFs (Conditional Random Fields) are designed and used. What I do not understand is how they are different from HMMs? I…
user1343318
  • 1,341
50
votes
3 answers

PCA and the train/test split

I have a dataset for which I have multiple sets of binary labels. For each set of labels, I train a classifier, evaluating it by cross-validation. I want to reduce dimensionality using principal component analysis (PCA). My question is: Is it…
Bitwise
  • 6,619
50
votes
2 answers

Logistic regression model does not converge

I've got some data about airline flights (in a data frame called flights) and I would like to see if the flight time has any effect on the probability of a significantly delayed arrival (meaning 10 or more minutes). I figured I'd use logistic…
50
votes
7 answers

Features for time series classification

I consider the problem of (multiclass) classification based on time series of variable length $T$, that is, to find a function $$f(X_T) = y \in [1..K]\\ \text{for } X_T = (x_1, \dots, x_T)\\ \text{with } x_t \in \mathbb{R}^d ~,$$ via a global…
Emile
  • 3,460
50
votes
2 answers

Dealing with singular fit in mixed models

Let's say we have a model mod <- Y ~ X*Condition + (X*Condition|subject) # Y = logit variable # X = continuous variable # Condition = values A and B, dummy coded; the design is repeated # so all participants go through both…
User33268
  • 1,722
50
votes
4 answers

Where does $\sqrt{n}$ come from in central limit theorem (CLT)?

A very simple version of central limited theorem as below $$ \sqrt{n}\bigg(\bigg(\frac{1}{n}\sum_{i=1}^n X_i\bigg) - \mu\bigg)\ \xrightarrow{d}\ \mathcal{N}(0,\;\sigma^2) $$ which is Lindeberg–Lévy CLT. I do not understand why there is a $\sqrt{n}$…
Flying pig
  • 6,239
50
votes
7 answers

Is Amazon's "average rating" misleading?

If I understand correctly, book ratings on a 1-5 scale are Likert scores. That is, a 3 for me may not necessarily be a 3 for someone else. It's an ordinal scale IMO. One shouldn't really average ordinal scales but can definitely take the mode,…
PhD
  • 14,627
50
votes
2 answers

Can somebody explain to me NUTS in english?

My understanding of the algorithm is the following: No U-Turn Sampler (NUTS) is a Hamiltonian Monte Carlo Method. This means that it is not a Markov Chain method and thus, this algorithm avoids the random walk part, which is often deemed as…
50
votes
2 answers

What is model identifiability?

I know that with a model that is not identifiable the data can be said to be generated by multiple different assignments to the model parameters. I know that sometimes it's possible to constrain parameters so that all are identifiable, as in the…
Jack Tanner
  • 4,842
50
votes
2 answers

Poisson regression to estimate relative risk for binary outcomes

Brief Summary Why is it more common for logistic regression (with odds ratios) to be used in cohort studies with binary outcomes, as opposed to Poisson regression (with relative risks)? Background Undergraduate and graduate statistics and…
jthetzel
  • 2,437
50
votes
1 answer

Difference between GradientDescentOptimizer and AdamOptimizer (TensorFlow)?

I've written a simple MLP in TensorFlow which is modelling a XOR-Gate. So for: input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] it should produce the following: output_data = [[0.], [1.], [1.], [0.]] The network has an input layer, a hidden…
daniel451
  • 2,915
50
votes
8 answers

What are the cons of Bayesian analysis?

What are some practical objections to the use of Bayesian statistical methods in any context? No, I don't mean the usual carping about choice of prior. I'll be delighted if this gets no answers.
user6666
50
votes
6 answers

What can we say about population mean from a sample size of 1?

I am wondering what we can say, if anything, about the population mean, $\mu$ when all I have is one measurement, $y_1$ (sample size of 1). Obviously, we'd love to have more measurements, but we can't get them. It seems to me that since the sample…
thedu
  • 525