Most Popular

1500 questions
9
votes
2 answers

How to model user's buying behavior on Amazon?

For our final course project in Data Science, we proposed the following- Give the Amazon Reviews Dataset, we plan to come up with an algorithm (thats roughly based on Personalized PageRank) that determines a strategic position for placing ads on…
9
votes
2 answers

High accuracy on test-set, what could go wrong?

You are given a pre-trained binary ML classification model with 99% accuracy on the test-set (assume the customer required 95% and that the test-set is balanced). We would like to deploy our model in production. What could go wrong? How would you…
CodeHoarder
  • 193
  • 1
  • 4
9
votes
2 answers

Effect of Stop-Word Removal on Transformers for Text Classification

The domain here is essentially topic classification, so not necessarily a problem where stop-words have an impact on the analysis (as opposed to, say, sentiment analysis where structure can affect meaning). With respect to the positional encoding…
Andy
  • 650
  • 4
  • 13
9
votes
3 answers

Sentiment Analysis Tutorial

I am trying to understand sentiment analysis and how to apply it using any language (R, Python etc). I would like to know if there is a good place on internet for tutorial that I can follow. I googled, but I wasn't very much satisfied because they…
KurioZ7
  • 285
  • 3
  • 7
9
votes
3 answers

Why is 10000 used as the denominator in Positional Encodings in the Transformer Model?

I was working through the All you need is Attention paper, and while the motivation of positional encodings makes sense and the other stackexchange answers filled me in on the motivations of the structure of it, I still don't understand why…
9
votes
1 answer

Why is the cosine distance used to measure the similatiry between word embeddings?

While computing the similarity between the words, cosine similarity or distance is computed on word vectors. Why aren't other distance metrics such as Euclidean distance suitable for this task. Let us consider 2 vectors a and b. Where, a = [-1,2,-3]…
Ashwin Geet D'Sa
  • 1,129
  • 2
  • 9
  • 20
9
votes
2 answers

Does "feature importance" depend on the model type?

I was working on a small classification problem (breast cancer data set from sklearn), and trying to decide which features were most important to predict the labels. I understand that there are several ways to define "important feature" here…
Frank
  • 200
  • 1
  • 5
9
votes
1 answer

Original Meaning of "Intelligence" in "Business Intelligence"

What does the term "Intelligence" originally stand for in "Business Intelligence" ? Does it mean as used in "Artificial Intelligence" or as used in "Intelligence Agency" ? In other words, does "Business Intelligence" mean: "Acting smart &…
9
votes
2 answers

image_dataset_from_directory VS flow_from_directory

What is the main diffrence between flow_from_directory VS image_dataset_from_directory in keras? which one should I use?
Bala venkatesh
  • 391
  • 1
  • 3
  • 10
9
votes
1 answer

Is it possible to have stratified train-test split of a set based on two columns?

Consider a dataframe that contains two columns, text and label. I can very easily create a stratified train-test split using sklearn.model_selection.train_test_split. The only thing I have to do is to set the column I want to use for the…
Aventinus
  • 213
  • 1
  • 3
  • 7
9
votes
3 answers

Multivariate Time series analysis: When is a CNN vs. LSTM appropriate?

I have multiple features in a time series and want to predict the values of the same features for the next time step. I have already trained an LSTM which is working okay, but takes a bit long to train. So now my question: is it reasonable to use a…
drops
  • 220
  • 2
  • 7
9
votes
3 answers

How to setup and run Conda on Google Colab

I am interested in using Google Colab for data modeling. How do I install conda, create an environment and run python in a notebook? I did some searching and found some helpful hints, but had several issues with this. I can only get a partially…
Donald S
  • 1,939
  • 3
  • 8
  • 28
9
votes
2 answers

Why leaky relu is not so common in real practice?

As leaky relu does not lead any value to 0, so training always continues. And I can't think of any disadvantages it have. Yet Leaky relu is less popular than Relu in real practice. Can someone tell why?
Prashant Gupta
  • 201
  • 2
  • 4
9
votes
2 answers

Is BERT a language model?

Is BERT a language model in the sense of a function that gets a sentence and returns a probability? I know its main usage is sentence embedding, but can it also provide this functionality?
Amit Keinan
  • 796
  • 6
  • 19
9
votes
2 answers

What is the meaning of a quadratic relation when r = 0?

A website (on page 4) says: The correlation coefficient is a measure of linear relationship and thus a value of r = 0 does not imply there is no relationship between the variables. For example in the following scatterplot which implies no…
Subhash C. Davar
  • 613
  • 5
  • 18