Highest Voted Questions - Data Science Stack Exchange

31

votes

6 answers

What is the reason behind taking log transformation of few continuous variables?

I have been doing a classification problem and I have read many people's code and tutorials. One thing I've noticed is that many people take np.log or log of continuous variable like loan_amount or applicant_income etc. I just want to understand…

asked Oct 23 '18 at 13:08

Sai Kumar

611
1
8
14

31

votes

2 answers

When should one use L1, L2 regularization instead of dropout layer, given that both serve same purpose of reducing overfitting?

In Keras, there are 2 methods to reduce over-fitting. L1,L2 regularization or dropout layer. What are some situations to use L1,L2 regularization instead of dropout layer? What are some situations when dropout layer is better?

asked Aug 23 '18 at 15:46

user781486

1,385
2
16
19

31

votes

3 answers

Keras Callback example for saving a model after every epoch?

Can someone please post a straightforward example of Keras using a callback to save a model after every epoch? I can find examples of saving weights, but I want to be able to save a completely functioning model after every training epoch.

asked Feb 22 '18 at 21:32

I_Play_With_Data

2,089
3
16
40

31

votes

6 answers

L2 loss vs. mean squared loss

I see some literature consider L2 loss (least squared error) and mean squared error loss are two different kinds of loss functions. However, it seems to me these two loss functions essentially compute the same thing (with a 1/n factor…

loss-function

asked Jan 01 '18 at 01:58

Edamame

2,745
5
24
33

31

votes

1 answer

How is a splitting point chosen for continuous variables in decision trees?

I have two questions related to decision trees: If we have a continuous attribute, how do we choose the splitting value? Example: Age=(20,29,50,40....) Imagine that we have a continuous attribute $f$ that have values in $R$. How can I write an…

asked Nov 03 '17 at 21:45

WALID BELRHALMIA

421
1
4
5

31

votes

4 answers

Is pandas now faster than data.table?

Here is the GitHub link to the most recent data.table benchmark. The data.table benchmarks has not been updated since 2014. I heard somewhere that Pandas is now faster than data.table. Is this true? Has anyone done any benchmarks? I have never used…

asked Oct 25 '17 at 02:43

xiaodai

630
1
5
13

31

votes

1 answer

What is a LB score in machine learning?

I was going through an article on kaggle blogs. Repeatedly, the author mentions 'LB score' and 'LB fit') as a metric for effectiveness of machine learning (along with cross validation (CV) score). With a research for the meaning of 'LB' I spent…

asked May 08 '17 at 05:13

user345394

505
1
4
8

31

votes

6 answers

How to fill missing value based on other columns in Pandas dataframe?

Suppose I have a 5*3 data frame in which third column contains missing value 1 2 3 4 5 NaN 7 8 9 3 2 NaN 5 6 NaN I hope to generate value for missing value based rule that first product second column 1 2 3 4 5 20 <--4*5 7 8 9 3 2 6 <-- 3*2 5 6 30…

pandas

asked Mar 22 '17 at 12:57

KyL

429
1
4
5

31

votes

3 answers

Neural Network for Multiple Output Regression

I have a dataset containing 34 input columns and 8 output columns. One way to solve the problem is to take the 34 inputs and build individual regression model for each output column. I am wondering if this problem can be solved using just one model…

asked Feb 10 '17 at 23:17

sjishan

411
1
5
6

31

votes

8 answers

How to count the number of missing values in each row in Pandas dataframe?

How can I get the number of missing value in each row in Pandas dataframe. I would like to split dataframe to different dataframes which have same number of missing values in each row. Any suggestion?

asked Jul 07 '16 at 10:26

Kaggle

2,877
5
14
8

30

votes

3 answers

What is difference between text classification and topic models?

I know the difference between clustering and classification in machine learning, but I don't understand the difference between text classification and topic modeling for documents. Can I use topic modeling over documents to identify a topic? Can I…

asked Aug 12 '14 at 03:50

Ali

361
2
4
6

30

votes

8 answers

Purpose of visualizing high dimensional data?

There are many techniques for visualizing high dimension datasets, such as T-SNE, isomap, PCA, supervised PCA, etc. And we go through the motions of projecting the data down to a 2D or 3D space, so we have a "pretty pictures". Some of these…

asked Nov 26 '15 at 04:28

hlin117

685
1
8
11

30

votes

3 answers

What is a better input for Word2Vec?

This is more like a general NLP question. What is the appropriate input to train a word embedding namely Word2Vec? Should all sentences belonging to an article be a separate document in a corpus? Or should each article be a document in said…

asked Nov 08 '15 at 04:17

wacax

3,390
4
23
45

30

votes

4 answers

macro average and weighted average meaning in classification_report

I use the "classification_report" from from sklearn.metrics import classification_report in order to evaluate the imbalanced binary classification Classification Report : precision recall f1-score support 0 1.00…

asked Jan 04 '20 at 10:38

user10296606

1,834
5
17
31

30

votes

2 answers

How to interpret classification report of scikit-learn?

As you can see, it is about a binary classification with linearSVC. The class 1 has a higher precision than class 0 (+7%), but class 0 has a higher recall than class 1 (+11%). How would you interpret this? And two other questions: what does…

asked Dec 08 '19 at 23:17

user77241

Most Popular