Highest Voted Questions - Data Science Stack Exchange

26

votes

2 answers

Why do we need to discard one dummy variable?

I have learned that, for creating a regression model, we have to take care of categorical variables by converting them into dummy variables. As an example, if, in our data set, there is a variable like location: Location…

asked Feb 18 '18 at 17:43

Mithun Sarker Shuvro

373
1
3
7

26

votes

1 answer

back propagation in CNN

I have the following CNN: I start with an input image of size 5x5 Then I apply convolution using 2x2 kernel and stride = 1, that produces feature map of size 4x4. Then I apply 2x2 max-pooling with stride = 2, that reduces feature map to size 2x2.…

asked Feb 06 '18 at 05:38

koryakinp

436
1
5
14

26

votes

6 answers

make seaborn heatmap bigger

I create a corr() df out of an original df. The corr() df came out 70 X 70 and it is impossible to visualize the heatmap... sns.heatmap(df). If I try to display the corr = df.corr(), the table doesn't fit the screen and I can see all the…

asked Mar 12 '17 at 18:32

redeemefy

631
1
6
9

26

votes

7 answers

Sharing Jupyter notebooks within a team

I would like to set up a server which could support a data science team in the following way: be a central point for storing, versioning, sharing and possible also executing Jupyter notebooks. Some desired properties: Different users can access the…

software-recommendation

asked Nov 08 '16 at 11:00

Dror Atariah

383
1
4
10

26

votes

4 answers

Is Data Science the Same as Data Mining?

I am sure data science as will be discussed in this forum has several synonyms or at least related fields where large data is analyzed. My particular question is in regards to Data Mining. I took a graduate class in Data Mining a few years back. …

asked May 14 '14 at 01:25

demongolem

413
5
10

26

votes

1 answer

RandomForestClassifier OOB scoring method

Does the random forest implementation in scikit-learn use mean accuracy as its scoring method to estimate generalization error with out-of-bag samples? This is not mentioned in the documentation, but the score() method reports the mean accuracy. I…

asked Aug 02 '16 at 15:47

darXider

613
1
5
12

26

votes

2 answers

How fit pairwise ranking models in XGBoost?

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and…

asked Feb 10 '16 at 16:40

tokestermw

418
1
4
8

25

votes

2 answers

What kinds of learning problems are suitable for Support Vector Machines?

What are the hallmarks or properties that indicate that a certain learning problem can be tackled using support vector machines? In other words, what is it that, when you see a learning problem, makes you go "oh I should definitely use SVMs for…

asked Jan 11 '16 at 07:16

Ragnar

511
1
5
16

25

votes

4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…

asked Sep 08 '15 at 03:14

GeorgeOfTheRF

2,028
5
17
20

25

votes

2 answers

Can you explain the difference between SVC and LinearSVC in scikit-learn?

I've recently started learning to work with sklearn and have just come across this peculiar result. I used the digits dataset available in sklearn to try different models and estimation methods. When I tested a Support Vector Machine model on the…

asked Sep 02 '15 at 14:49

metjush

536
1
5
7

25

votes

3 answers

K-means incoherent behaviour choosing K with Elbow method, BIC, variance explained and silhouette

I'm trying to cluster some vectors with 90 features with K-means. Since this algorithm asks me the number of clusters, I want to validate my choice with some nice math. I expect to have from 8 to 10 clusters. The features are Z-score scaled. Elbow…

asked Jul 20 '15 at 08:03

marcodena

1,667
4
14
17

25

votes

3 answers

How do you manage expectations at work?

With all the hoopla around Data Science, Machine Learning, and all the success stories around, there are a lot of both justified, as well as overinflated, expectations from Data Scientists and their predictive models. My question to practicing…

asked Jun 14 '15 at 14:27

neuron

664
1
6
9

25

votes

4 answers

What does the output of model.predict function from Keras mean?

I have built a LSTM model to predict duplicate questions on the Quora official dataset. The test labels are 0 or 1. 1 indicates the question pair is duplicate. After building the model using model.fit, I test the model using model.predict on the…

asked Jul 31 '18 at 03:48

Dookoto_Sea

361
1
3
3

25

votes

3 answers

How do you apply SMOTE on text classification?

Synthetic Minority Oversampling Technique (SMOTE) is an oversampling technique used in an imbalanced dataset problem. So far I have an idea how to apply it on generic, structured data. But is it possible to apply it on text classification problem?…

asked Feb 10 '18 at 11:18

catris25

369
1
3
5

25

votes

3 answers

Should I use GPU or CPU for inference?

I'm running a deep learning neural network that has been trained by a GPU. I now want to deploy this to multiple hosts for inference. The question is what are the conditions to decide whether I should use GPU's or CPUs for inference? Adding more…

asked Sep 26 '17 at 22:13

Dan

361
1
3
6

Most Popular