Highest Voted Questions - Statistical Analysis Stack Exchange

54

votes

3 answers

Different ways to write interaction terms in lm?

I have a question about which is the best way to specify an interaction in a regression model. Consider the following data: d <- structure(list(r = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),…

asked Dec 02 '11 at 20:23

Manuel Ramón

2,115

54

votes

2 answers

Hierarchical clustering with mixed type data - what distance/similarity to use?

In my dataset we have both continuous and naturally discrete variables. I want to know whether we can do hierarchical clustering using both type of variables. And if yes, what distance measure is appropriate?

asked Sep 07 '11 at 16:18

Beta

6,334

54

votes

5 answers

Dynamic Time Warping Clustering

What would be the approach to use Dynamic Time Warping (DTW) to perform clustering of time series? I have read about DTW as a way to find similarity between two time series, while they could be shifted in time. Can I use this method as a similarity…

asked Jan 05 '15 at 15:34

Kobe-Wan Kenobi

2,857

54

votes

4 answers

Class imbalance in Supervised Machine Learning

This is a question in general, not specific to any method or data set. How do we deal with a class imbalance problem in Supervised Machine learning where the number of 0 is around 90% and number of 1 is around 10% in your dataset.How do we optimally…

asked Jan 05 '15 at 12:14

NG_21

1,556
4
17
25

54

votes

3 answers

Why do we care so much about normally distributed error terms (and homoskedasticity) in linear regression when we don't have to?

I suppose I get frustrated every time I hear someone say that non-normality of residuals and /or heteroskedasticity violates OLS assumptions. To estimate parameters in an OLS model neither of these assumptions are necessary by the Gauss-Markov…

asked Dec 30 '14 at 22:22

Zachary Blumenfeld

3,974

54

votes

3 answers

Do we have a problem of "pity upvotes"?

I know, this may sound like it is off-topic, but hear me out. At Stack Overflow and here we get votes on posts, this is all stored in a tabular form. E.g.: post id voter id vote type datetime ------- -------- --------- …

asked Jun 01 '11 at 01:57

Sam Saffron

619

53

votes

4 answers

Approximate order statistics for normal random variables

Are there well known formulas for the order statistics of certain random distributions? Particularly the first and last order statistics of a normal random variable, but a more general answer would also be appreciated. Edit: To clarify, I am…

asked Mar 31 '11 at 10:14

Chris Taylor

3,682

53

votes

2 answers

Why is a Bayesian not allowed to look at the residuals?

In the article "Discussion: Should Ecologists Become Bayesians?" Brian Dennis gives a surprisingly balanced and positive view of Bayesian statistics when his aim seems to be to warn people about it. However, in one paragraph, without any citations…

asked Feb 06 '14 at 08:53

Mankka

633

53

votes

4 answers

Why does the correlation coefficient between X and X-Y random variables tend to be 0.7

Taken from Practical Statistics for Medical Research where Douglas Altman writes in page 285: ...for any two quantities X and Y, X will be correlated with X-Y. Indeed, even if X and Y are samples of random numbers we would expect the…

asked Mar 06 '13 at 10:58

nostock

1,507
4
17
23

53

votes

3 answers

Are splines overfitting the data?

My problem: I recently met a statistician that informed me that splines are only useful for exploring data and are subjected to overfitting, thus not useful in prediction. He preferred exploring with simple polynomials ... As I’m a big fan of…

asked Feb 01 '13 at 09:36

Max Gordon

5,926
8
34
52

53

votes

1 answer

Regression: Transforming Variables

When transforming variables, do you have to use all of the same transformation? For example, can I pick and choose differently transformed variables, as in: Let, $x_1,x_2,x_3$ be age, length of employment, length of residence, and income. Y =…

asked Nov 23 '10 at 17:41

Brandon Bertelsen

7,232
9
41
48

53

votes

4 answers

Is it possible to give variable sized images as input to a convolutional neural network?

Can we give images with variable size as input to a convolutional neural network for object detection? If possible, how can we do that? But if we try to crop the image, we will be loosing some portion of the image and if we try to resize, then, the…

asked Jan 24 '19 at 04:03

Ashna Eldho

631

53

votes

8 answers

Excel as a statistics workbench

It seems that lots of people (including me) like to do exploratory data analysis in Excel. Some limitations, such as the number of rows allowed in a spreadsheet, are a pain but in most cases don't make it impossible to use Excel to play around with…

asked Oct 07 '10 at 17:44

Carlos Accioly

5,025
4
28
29

53

votes

1 answer

Rank in R - descending order

I am looking to rank data that, in some cases, the larger value has the rank of 1. I am relatively new to R, but I don't see how I can adjust this setting in the rank function. x <- c(23,45,12,67,34,89) rank(x) generates: [1] 2 4 1 5 3 6 when I…

r

asked Oct 04 '10 at 21:57

Btibert3

1,334
2
15
24

53

votes

6 answers

Why is softmax output not a good uncertainty measure for Deep Learning models?

I've been working with Convolutional Neural Networks (CNNs) for some time now, mostly on image data for semantic segmentation/instance segmentation. I've often visualized the softmax of the network output as a "heat map" to see how high per pixel…

asked Oct 24 '17 at 12:58

Honeybear

659

Most Popular