Highest Voted Questions - Data Science Stack Exchange

12

votes

4 answers

How to process natural language queries?

I'm curious about natural language querying. Stanford has what looks to be a strong set of software for processing natural language. I've also seen the Apache OpenNLP library, and the General Architecture for Text Engineering. There are an…

nlp

asked Jun 14 '14 at 20:32

Steve Kallestad

3,128
4
21
39

12

votes

1 answer

How to deal with TypeError: ufunc 'isnan' not supported for the input types

I have dealt with all the Nan values in the features dataframe, then why I am still getting this error? sns.heatmap(features, annot=True, annot_kws={"size": 7}) sns.plt.show() TypeError Traceback (most recent…

asked Jun 12 '18 at 04:11

Jodh Singh

264
1
3
10

12

votes

4 answers

Export pandas to dictionary by combining multiple row values

I have a pandas dataframe df that looks like this name value1 value2 A 123 1 B 345 5 C 712 4 B 768 2 A 318 9 C 178 6 A 321 3 I want to convert…

asked May 29 '18 at 15:48

sfactor

223
1
2
6

12

votes

2 answers

The differences between SVM and Logistic Regression

I am reading about SVM and I've faced to the point that non-kernelized SVMs are nothing more than linear separators. Therefore, is the only difference between an SVM and logistic regression the criterium to choose the boundary? Apparently, SVM…

asked May 09 '18 at 15:34

David Masip

6,051
2
24
61

12

votes

6 answers

Is it possible to cluster data according to a target?

I was wondering if there exists techniques to cluster data according to a target. For example, suppose we want to find groups of customers likely to churn: Target is churn. We want to find clusters exhibiting the same behaviour according to the…

asked Apr 26 '18 at 09:21

Tanguy

270
2
10

12

votes

5 answers

Unsupervised image segmentation

I am trying to implement an algorithm where given an image with several objects on a plane table, desired is the output of segmentation masks for each object. Unlike in CNN's, the objective here is to detect objects in an unfamiliar environment.…

asked Apr 23 '18 at 15:29

MuhsinFatih

221
2
5

12

votes

3 answers

Does Amazon RedShift replace Hadoop for ~1XTB data?

There is plenty of hype surrounding Hadoop and its eco-system. However, in practice, where many data sets are in the terabyte range, is it not more reasonable to use Amazon RedShift for querying large data sets, rather than spending time and effort…

asked Jun 11 '14 at 04:24

trienism

253
2
9

12

votes

1 answer

Are the raw probabilities obtained from XGBoost, representative of the true underlying probabilties?

1) Is it feasible to use the raw probabilities obtained from XGBoost, e.g. probabilities obtained within the range of 0.4-0.5, as a true representation of approximately 40%-50% chance of an event occurring? (assuming we have an accurate model) 2)…

asked Mar 08 '18 at 12:42

Gale

403
1
4
14

12

votes

1 answer

Using a pre trained CNN classifier and apply it on a different image dataset

How would you optimize a pre-trained neural network to apply it to a separate problem? Would you just add more layers to the pre-trained model and test it on your data set? For example, if the task was to use a CNN to classify wallpaper groups, I'm…

asked Feb 27 '18 at 23:10

Sid

677
1
5
14

12

votes

4 answers

Neural networks - Find most similar images

I am working with Python, scikit-learn and keras. I have 3000 thousands images of front-faced watches like the following ones: Watch_1, Watch_2, Watch_3. I want to write a program which receives as an input a photo of a real watch which maybe taken…

asked Feb 14 '18 at 12:33

Outcast

1,057
2
12
29

12

votes

2 answers

Is a 100% model accuracy on out-of-sample data overfitting?

I have just completed the machine learning for R course on cognitiveclass.ai and have begun experimenting with randomforests. I have made a model by using the "randomForest" library in R. The model classifies by two classes, good, and bad. I know…

asked Feb 08 '18 at 09:13

Milan van Dijck

123
1
6

12

votes

2 answers

Catboost Categorical Features Handling Options (CTR settings)?

I am working with a dataset with large number of categorical features (>80%) predicting a continuous target variable (i.e. Regression). I have been reading quite a bit about ways to handle categorical features. And learned that one-hot encoding I…

asked Jan 24 '18 at 15:50

TwinPenguins

4,249
3
19
53

12

votes

1 answer

What is the difference between topic modeling and clustering?

I know that topic modeling and clustering are related, but not similar techniques. Can anyone suggest what are the main differences?

asked Jan 18 '18 at 06:20

sara

481
7
15

12

votes

9 answers

What are some easy to learn machine-learning applications?

Being new to machine-learning in general, I'd like to start playing around and see what the possibilities are. I'm curious as to what applications you might recommend that would offer the fastest time from installation to producing a meaningful…

machine-learning

asked Jun 10 '14 at 11:05

Steve Kallestad

3,128
4
21
39

12

votes

4 answers

Can I use cosine similarity as a distance metric in a KNN algorithm

Most discussions of KNN mention Euclidean,Manhattan and Hamming distances, but they dont mention cosine similarity metric. Is there a reason for this?

asked Jan 09 '18 at 16:05

Victor

611
3
8
19

Most Popular