Highest Voted Questions - Data Science Stack Exchange

9

votes

2 answers

XGBoost and Random Forest: ntrees vs. number of boosting rounds vs. n_estimators

So I understand the main difference between Random Forests and GB Methods. Random Forests grow parallel trees and GB Methods grow one tree for each iteration. However, I am confused on the vocab used with scikit's RF regressor and xgboost's…

asked Apr 22 '20 at 15:06

Jack Armstrong

243
2
6

9

votes

3 answers

How to prevent vanishing gradient or exploding gradient?

Whats causing the vanishing gradient or exploding gradient to occur, and what are the measures to be taken to prevent it?

asked Apr 15 '20 at 05:00

yashdk

129
1
1
3

9

votes

2 answers

Optimising for Brier objective function directly gives worse Brier score than optimising with custom objective - what does it tell me?

I am training an XGBoost model and as I care the most about resulting probabilities, not classification itself I have chosen Brier score as a metric for my model, so that probabilities would be well calibrated. I tuned my hyperparameters using…

asked Apr 06 '20 at 07:27

Xaume

192
3
12

9

votes

2 answers

How to normalize data without knowing the min and max values?

I have a Lending club dataset from Kaggle; it contains many different columns: there are for example dummy variables, years, amount of the loan...ect I want to normalize the data in the training and test set but I have to use the Min and Max of the…

asked Mar 25 '20 at 21:10

Ghassen Ben Hamida

93
1
4

9

votes

1 answer

Train a GAN on "before and after" images of dental surgeries

I want a GAN to train on "before and after" images of dental surgeries; so that it can generate "after" pictures for fresh patients. Input images are like these:…

asked Mar 21 '20 at 05:21

Lakshay Dulani

265
2
6

9

votes

1 answer

How fbprophet cross validation works

I am facing some issues to understand how cross_validation function works in fbprophet packages. I have a time series of 68 days (only business days) grouped by 15min and a certain metric : 00:00 5 00:15 2 00:30 10 etc 23:45 26 And I really…

asked Mar 06 '20 at 14:04

Katy

93
1
1
4

9

votes

1 answer

Understanding dropout and gradient descent

I am looking at how to implement dropout on deep neural networks and found something counter intuitive. In the forward phase dropout mask activations with a random tensor of 1s and 0s to force net to learn the average of the weights. This help the…

asked Aug 27 '15 at 19:36

emanuele

415
1
4
8

9

votes

5 answers

Any idea about application of deep dream?

Recently Google publicized interesting deep dream. Besides art generation such as http://deepdreamgenerator.com/, do you see any potential applications of deep dream in computer vision or machine learning?

asked Aug 12 '15 at 16:17

rudky martin

9

votes

1 answer

sklearn - overfitting problem

I'm looking for recommendations as to the best way forward for my current machine learning problem The outline of the problem and what I've done is as follows: I have 900+ trials of EEG data, where each trial is 1 second long. The ground truth is…

asked Aug 11 '15 at 22:21

Simon

1,071
2
10
28

9

votes

2 answers

Can I use LSTM models to evaluate multiple, independent time series?

Let's say that I would like to predict the temperature tomorrow. I could use the approach whereby I train a model based on a time-series dataset collected from a single location (for example, see this excellent…

asked Jan 28 '20 at 21:26

CharismaticChromoFauna

101
1
6

9

votes

5 answers

How can we extract fields from images?

I am making an document parser which extracts data fields from the documents and store them in a structured way. Each field in my dataset is horizontal which is easy to extract. But the model fails on following type of example - Is there any way…

asked Jan 16 '20 at 12:35

hR 312

91
1
8

9

votes

2 answers

Why continuous features are more important than categorical features in decision tree models?

I have both categorical and continuous features in my prediction model and want to select (and rank) most important features. I have converted all categorical variables into dummy variables using one hot encoding (for better interpretation in my…

asked Jan 15 '20 at 14:55

Shahab Kazemi

103
4

9

votes

1 answer

Using a GAN discriminator as a standalone classifier

The goal of the discriminator in a GAN is to distinguish between real inputs and inputs synthesized by the generator. Suppose I train a GAN until the generator is good enough to fool the discriminator much of the time. Could I then use the…

asked Jan 09 '20 at 20:34

rgov

193
3

9

votes

1 answer

Feature selection for Support Vector Machines

My question is three-fold In the context of "Kernelized" support vector machines Is variable/feature selection desirable - especially since we regularize the parameter C to prevent overfitting and the main motive behind introducing kernels to a SVM…

asked Jul 26 '15 at 12:17

Nitin Srivastava

93
7

9

votes

3 answers

Is (nearly) all data separable?

Suppose I have some data set with two classes. I could draw a decision boundary around each data point belonging to one of these classes, and hence, separate the data, like so: Where the red lines are the decision boundaries around the data points…

asked Jan 01 '20 at 17:03

Data

467
3
11

Most Popular