Highest Voted Questions - Statistical Analysis Stack Exchange

76

votes

2 answers

Practical questions on tuning Random Forests

My questions are about Random Forests. The concept of this beautiful classifier is clear to me, but still there are a lot of practical usage questions. Unfortunately, I failed to find any practical guide to RF (I've been searching for something like…

asked Mar 25 '13 at 15:53

lithuak

1,013

76

votes

4 answers

Is standardization needed before fitting logistic regression?

My question is do we need to standardize the data set to make sure all variables have the same scale, between [0,1], before fitting logistic regression. The formula is: $$\frac{x_i-\min(x_i)}{\max(x_i)-\min(x_i)}$$ My data set has 2 variables,…

asked Jan 23 '13 at 16:33

user1946504

1,337

76

votes

5 answers

How small a quantity should be added to x to avoid taking the log of zero?

I have analysed my data as they are. Now I want to look at my analyses after taking the log of all variables. Many variables contain many zeros. Therefore I add a small quantity to avoid taking the log of zero. So far I've added 10^-10, without any…

asked Jun 19 '12 at 09:47

miura

3,684

76

votes

8 answers

Clustering with a distance matrix

I have a (symmetric) matrix M that represents the distance between each pair of nodes. For example, A B C D E F G H I J K L A 0 20 20 20 40 60 60 60 100 120 120 120 B 20 0 20 20 60 80 80 80 120 140 140…

asked Sep 16 '10 at 11:47

yassin

863

76

votes

11 answers

Is there any mathematical basis for the Bayesian vs frequentist debate?

It says on Wikipedia that: the mathematics [of probability] is largely independent of any interpretation of probability. Question: Then if we want to be mathematically correct, shouldn't we disallow any interpretation of probability? I.e., are…

asked Aug 18 '16 at 03:40

Chill2Macht

6,249

75

votes

1 answer

How to split the dataset for cross validation, learning curve, and final evaluation?

What is an appropriate strategy for splitting the dataset? I ask for feedback on the following approach (not on the individual parameters like test_size or n_iter, but if I used X, y, X_train, y_train, X_test, and y_test appropriately and if the…

asked Apr 30 '14 at 10:44

tobip

1,570

75

votes

7 answers

Rule of thumb for number of bootstrap samples

I wonder if someone knows any general rules of thumb regarding the number of bootstrap samples one should use, based on characteristics of the data (number of observations, etc.) and/or the variables included?

asked Feb 10 '14 at 08:33

hoyem

1,161

75

votes

6 answers

Do the predictions of a Random Forest model have a prediction interval?

If I run a randomForest model, I can then make predictions based on the model. Is there a way to get a prediction interval of each of the predictions such that I know how "sure" the model is of its answer. If this is possible is it simply based on…

asked Apr 22 '13 at 22:07

Dean MacGregor

1,138

75

votes

4 answers

Linear model with log-transformed response vs. generalized linear model with log link

In this paper titled "CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA" the authors write: In a generalized linear model, the mean is transformed, by the link function, instead of transforming the response itself. The two methods …

asked Jan 16 '13 at 10:01

miura

3,684

75

votes

6 answers

Model for predicting number of Youtube views of Gangnam Style

PSY's music video "Gangnam style" is popular, after a little more than 2 months it has about 540 million viewers. I learned this from my preteen children at dinner last week and soon the discussion went in the direction of if it was possible to do…

asked Oct 27 '12 at 05:49

FredrikD

853

75

votes

4 answers

Standard error for the mean of a sample of binomial random variables

Suppose I'm running an experiment that can have 2 outcomes, and I'm assuming that the underlying "true" distribution of the 2 outcomes is a binomial distribution with parameters $n$ and $p$: ${\rm Binomial}(n, p)$. I can compute the standard error,…

asked Jun 01 '12 at 16:18

Frank

1,686

75

votes

7 answers

Do all interactions terms need their individual terms in regression model?

I am actually reviewing a manuscript where the authors compare 5-6 logit regression models with AIC. However, some of the models have interaction terms without including the individual covariate terms. Does it ever make sense to do this? For example…

asked May 04 '12 at 02:10

djhocking

1,931

75

votes

6 answers

Criticism of Pearl's theory of causality

In the year 2000, Judea Pearl published Causality. What controversies surround this work? What are its major criticisms?

causality

asked Apr 13 '12 at 23:08

Neil G

15,219

75

votes

3 answers

What's the difference between feed-forward and recurrent neural networks?

What is the difference between a feed-forward and recurrent neural network? Why would you use one over the other? Do other network topologies exist?

asked Aug 30 '10 at 15:33

Shane

12,461

75

votes

7 answers

Which activation function for output layer?

While the choice of activation functions for the hidden layer is quite clear (mostly sigmoid or tanh), I wonder how to decide on the activation function for the output layer. Common choices are linear functions, sigmoid functions and softmax…

neural-networks

asked Jun 12 '16 at 14:42

Funkwecker

3,082

Most Popular