Most Popular
1500 questions
12
votes
4 answers
How to process natural language queries?
I'm curious about natural language querying. Stanford has what looks to be a strong set of software for processing natural language. I've also seen the Apache OpenNLP library, and the General Architecture for Text Engineering.
There are an…
Steve Kallestad
- 3,128
- 4
- 21
- 39
12
votes
1 answer
How to deal with TypeError: ufunc 'isnan' not supported for the input types
I have dealt with all the Nan values in the features dataframe, then why I am still getting this error?
sns.heatmap(features, annot=True, annot_kws={"size": 7})
sns.plt.show()
TypeError Traceback (most recent…
Jodh Singh
- 264
- 1
- 3
- 10
12
votes
4 answers
Export pandas to dictionary by combining multiple row values
I have a pandas dataframe df that looks like this
name value1 value2
A 123 1
B 345 5
C 712 4
B 768 2
A 318 9
C 178 6
A 321 3
I want to convert…
sfactor
- 223
- 1
- 2
- 6
12
votes
2 answers
The differences between SVM and Logistic Regression
I am reading about SVM and I've faced to the point that non-kernelized SVMs are nothing more than linear separators. Therefore, is the only difference between an SVM and logistic regression the criterium to choose the boundary?
Apparently, SVM…
David Masip
- 6,051
- 2
- 24
- 61
12
votes
6 answers
Is it possible to cluster data according to a target?
I was wondering if there exists techniques to cluster data according to a target. For example, suppose we want to find groups of customers likely to churn:
Target is churn.
We want to find clusters exhibiting the same behaviour according to the…
Tanguy
- 270
- 2
- 10
12
votes
5 answers
Unsupervised image segmentation
I am trying to implement an algorithm where given an image with several objects on a plane table, desired is the output of segmentation masks for each object. Unlike in CNN's, the objective here is to detect objects in an unfamiliar environment.…
MuhsinFatih
- 221
- 2
- 5
12
votes
3 answers
Does Amazon RedShift replace Hadoop for ~1XTB data?
There is plenty of hype surrounding Hadoop and its eco-system. However, in practice, where many data sets are in the terabyte range, is it not more reasonable to use Amazon RedShift for querying large data sets, rather than spending time and effort…
trienism
- 253
- 2
- 9
12
votes
1 answer
Are the raw probabilities obtained from XGBoost, representative of the true underlying probabilties?
1) Is it feasible to use the raw probabilities obtained from XGBoost, e.g. probabilities obtained within the range of 0.4-0.5, as a true representation of approximately 40%-50% chance of an event occurring? (assuming we have an accurate model)
2)…
Gale
- 403
- 1
- 4
- 14
12
votes
1 answer
Using a pre trained CNN classifier and apply it on a different image dataset
How would you optimize a pre-trained neural network to apply it to a separate problem? Would you just add more layers to the pre-trained model and test it on your data set?
For example, if the task was to use a CNN to classify wallpaper groups, I'm…
Sid
- 677
- 1
- 5
- 14
12
votes
4 answers
Neural networks - Find most similar images
I am working with Python, scikit-learn and keras. I have 3000 thousands images of front-faced watches like the following ones:
Watch_1, Watch_2, Watch_3.
I want to write a program which receives as an input a photo of a real watch which maybe taken…
Outcast
- 1,057
- 2
- 12
- 29
12
votes
2 answers
Is a 100% model accuracy on out-of-sample data overfitting?
I have just completed the machine learning for R course on cognitiveclass.ai and have begun experimenting with randomforests.
I have made a model by using the "randomForest" library in R. The model classifies by two classes, good, and bad.
I know…
Milan van Dijck
- 123
- 1
- 6
12
votes
2 answers
Catboost Categorical Features Handling Options (CTR settings)?
I am working with a dataset with large number of categorical features (>80%) predicting a continuous target variable (i.e. Regression). I have been reading quite a bit about ways to handle categorical features. And learned that one-hot encoding I…
TwinPenguins
- 4,249
- 3
- 19
- 53
12
votes
1 answer
What is the difference between topic modeling and clustering?
I know that topic modeling and clustering are related, but not similar techniques. Can anyone suggest what are the main differences?
sara
- 481
- 7
- 15
12
votes
9 answers
What are some easy to learn machine-learning applications?
Being new to machine-learning in general, I'd like to start playing around and see what the possibilities are.
I'm curious as to what applications you might recommend that would offer the fastest time from installation to producing a meaningful…
Steve Kallestad
- 3,128
- 4
- 21
- 39
12
votes
4 answers
Can I use cosine similarity as a distance metric in a KNN algorithm
Most discussions of KNN mention Euclidean,Manhattan and Hamming distances, but they dont mention cosine similarity metric. Is there a reason for this?
Victor
- 611
- 3
- 8
- 19