Most Popular
1500 questions
38
votes
7 answers
Is there a good browser/viewer to see an R dataset (.rda file)
I want to browse a .rda file (R dataset). I know about the View(datasetname) command. The default R.app that comes for Mac does not have a very good browser for data (it opens a window in X11). I like the RStudio data browser that opens with the…
Curious2learn
- 705
38
votes
1 answer
Why does glmer not achieve the maximum likelihood (as verified by applying further generic optimization)?
Numerically deriving the MLEs of GLMM is difficult and, in practice, I know, we should not use brute force optimization (e.g., using optim in a simple way). But for my own educational purpose, I want to try it to make sure I correctly understand the…
quibble
- 1,436
37
votes
4 answers
What is one class SVM and how does it work?
I was using one class SVM, implemented in scikit-learn, for my research work. But I have no good understanding of this.
Can anyone please give a simple, good explanation of one class SVM?
37
votes
5 answers
Free data set for very high dimensional classification
What are the freely available data set for classification with more than 1000 features (or sample points if it contains curves)?
There is already a community wiki about free data sets:
Locating freely available data samples
But here, it would be…
robin girard
- 6,705
37
votes
1 answer
How do decision tree learning algorithms deal with missing values (under the hood)
What are the methods that decision tree learning algorithms use to deal with missing values?
Do they simply full the slot in using a value called missing?
user1172468
- 2,035
37
votes
8 answers
Help me calculate how many people will come to my wedding! Can I attribute a percentage to each person and add them?
I am planning my wedding. I wish to estimate how many people will come to my wedding. I have created a list of people and the chance that they will attend in percentage. For example
Dad 100%
Mom 100%
Bob 50%
Marc 10%
Jacob 25%
Joseph 30%
I…
Behacad
- 5,064
- 8
- 35
- 49
37
votes
5 answers
Expected prediction error - derivation
I am struggling to understand the derivation of the expected prediction error per below (ESL), especially on the derivation of 2.11 and 2.12 (conditioning, the step towards point-wise minimum). Any pointers or links much appreciated.
Below I am…
user1885116
- 2,318
37
votes
3 answers
Why is AUC higher for a classifier that is less accurate than for one that is more accurate?
I have two classifiers
A: naive Bayesian network
B: tree (singly-connected) Bayesian network
In terms of accuracy and other measures, A performs comparatively worse than B. However, when I use the R packages ROCR and AUC to perform ROC analysis,…
Jane Wayne
- 1,399
37
votes
5 answers
What are the relative merits of Winsorizing vs. Trimming data?
Winsorizing data means to replace the extreme values of a data set with a certain percentile value from each end, while Trimming or Truncating involves removing those extreme values.
I always see both methods discussed as a viable option to lessen…
Brian
- 631
37
votes
4 answers
Intuition behind standard deviation
I'm trying to gain a better intuitive understanding of standard deviation.
From what I understand it is representative of the average of the differences of a set of observations in a data set from the mean of that data set. However it is NOT…
sonicboom
- 930
37
votes
3 answers
Does a sample version of the one-sided Chebyshev inequality exist?
I am interested in the following one-sided Cantelli's version of the Chebyshev inequality:
$$
\mathbb P(X - \mathbb E (X) \geq t) \leq \frac{\mathrm{Var}(X)}{\mathrm{Var}(X) + t^2} \,.
$$
Basically, if you know the population mean and variance, you…
casandra
- 623
- 6
- 10
37
votes
4 answers
Checking assumptions lmer/lme mixed models in R
I ran a repeated design whereby I tested 30 males and 30 females across three different tasks. I want to understand how the behaviour of males and females is different and how that depends on the task. I used both the lmer and lme4 package to…
crazjo
- 838
- 1
- 11
- 19
37
votes
7 answers
Statistical methods to more efficiently plot data when millions of points are present?
I find R can take a long time to generate plots when millions of points are present - unsurprising given that points are plotted individually. Furthermore, such plots are often too cluttered and dense to be useful. Many of the points overlap and…
Alex Stoddard
- 473
37
votes
8 answers
What is Bayes' theorem all about?
What are the main ideas, that is, concepts related to Bayes' theorem?
I am not asking for any derivations of complex mathematical notation.
user333
- 7,211
37
votes
1 answer
How are the standard errors computed for the fitted values from a logistic regression?
When you predict a fitted value from a logistic regression model, how are standard errors computed? I mean for the fitted values, not for the coefficients (which involves Fishers information matrix).
I only found out how to get the numbers with R…
user2457873
- 373