Most Popular

1500 questions
13
votes
1 answer

So what's the catch with LSTM?

I am expanding my knowledge of the Keras package and I have been tooling with some of the available models. I have an NLP binary classification problem that I'm trying to solve and have been applying different models. After working with some…
I_Play_With_Data
  • 2,089
  • 3
  • 16
  • 40
13
votes
2 answers

CNN - How does backpropagation with weight-sharing work exactly?

Consider a Convolutional Neural Network (CNN) for image classification. In order to detect local features, weight-sharing is used among units in the same convolutional layer. In such a network, the kernel weights are updated via the backpropagation…
Andy R
  • 413
  • 1
  • 4
  • 8
13
votes
2 answers

When to use Stateful LSTM?

I'm trying to use LSTM on time-series data in order to generate future sequences that looks like the original sequences in term of values and progression direction. My approach is: train RNN to predict a value based on 25 past values then use the…
Hastu
  • 418
  • 1
  • 5
  • 11
13
votes
2 answers

cosine_similarity returns matrix instead of single value

I am using below code to compute cosine similarity between the 2 vectors. It returns a matrix instead of a single value 0.8660254. [[ 1. 0.8660254] [ 0.8660254 1. ]] from sklearn.metrics.pairwise import cosine_similarity vec1 =…
Olivia Brown
  • 233
  • 1
  • 2
  • 4
13
votes
2 answers

Ethically and Cost-effectively Scaling Data Scrapes

Few things in life give me pleasure like scraping structured and unstructured data from the Internet and making use of it in my models. For instance, the Data Science Toolkit (or RDSTK for R programmers) allows me to pull lots of good…
Hack-R
  • 1,919
  • 1
  • 21
  • 34
13
votes
3 answers

How can autoencoders be used for clustering?

Suppose I have a set of time-domain signals with absolutely no labels. I want to cluster them in 2 or 3 classes. Autoencoders are unsupervised networks that learn to compress the inputs. So given an input $x^{(i)}$, weights $W_1$ and $W_2$, biases…
Tendero
  • 243
  • 1
  • 2
  • 6
13
votes
4 answers

What is the difference between outlier detection and anomaly detection?

I would like to know the difference in terms of applications (e.g. which one is credit card fraud detection?) and in terms of used techniques. Example papers which define the task would be welcome.
Martin Thoma
  • 18,880
  • 35
  • 95
  • 169
13
votes
1 answer

How to do stepwise regression using sklearn?

I could not find a way to stepwise regression in scikit learn. I have checked all other posts on Stack Exchange on this topic. Answers to all of them suggests using f_regression. But f_regression does not do stepwise regression but only give F-score…
nlahri
  • 131
  • 1
  • 1
  • 3
13
votes
2 answers

Why using L1 regularization over L2?

Conducting a linear regression model using a loss function, why should I use $L_1$ instead of $L_2$ regularization? Is it better at preventing overfitting? Is it deterministic (so always a unique solution)? Is it better at feature selection (because…
astudentofmaths
  • 273
  • 1
  • 4
  • 8
13
votes
5 answers

Clustering with cosine similarity

I have a large data set and a cosine similarity between them. I would like to cluster them using cosine similarity that puts similar objects together without needing to specify beforehand the number of clusters I expect. I read the sklearn…
Smith Volka
  • 665
  • 2
  • 6
  • 13
13
votes
1 answer

Does nearest neighbour make any sense with t-SNE?

Answers on here have stated that the dimensions in t-SNE are meaningless, and that the distances between points are not a measure of similarity. However, can we say anything about a point based on it's nearest neighbours in t-SNE space? This answer…
geometrikal
  • 533
  • 1
  • 5
  • 14
13
votes
2 answers

Sort numbers using only 2 hidden layers

I'm reading the cornerstone paper Sequence to Sequence Learning with Neural Networks by Ilya Sutskever and Quoc Le. On the first page, it briefly mentions that: A surprising example of the power of DNNs is their ability to sort N N-bit numbers…
aerin
  • 907
  • 1
  • 9
  • 13
13
votes
3 answers

An Artificial Neural Network (ANN) with an arbitrary number of inputs and outputs

I would like to use ANNs for my problem, but the issue is my inputs and outputs node numbers are not fixed. I did some google searches before asking my question and found that the RNN may help me with my problem. But, all examples which I've found…
Vadim
  • 303
  • 2
  • 11
13
votes
3 answers

Does Batch Normalization make sense for a ReLU activation function?

Batch Normalization is described in this paper as a normalization of the input to an activation function with scale and shift variables $\gamma$ and $\beta$. This paper mainly describes using the sigmoid activation function, which makes sense.…
bnorm
  • 533
  • 1
  • 4
  • 9
13
votes
4 answers

Algorithm for generating classification rules

So we have potential for a machine learning application that fits fairly neatly into the traditional problem domain solved by classifiers, i.e., we have a set of attributes describing an item and a "bucket" that they end up in. However, rather than…
super_seabass
  • 233
  • 2
  • 6