Highest Voted Questions - Data Science Stack Exchange

13

votes

1 answer

So what's the catch with LSTM?

I am expanding my knowledge of the Keras package and I have been tooling with some of the available models. I have an NLP binary classification problem that I'm trying to solve and have been applying different models. After working with some…

asked Feb 02 '18 at 15:45

I_Play_With_Data

2,089
3
16
40

13

votes

2 answers

CNN - How does backpropagation with weight-sharing work exactly?

Consider a Convolutional Neural Network (CNN) for image classification. In order to detect local features, weight-sharing is used among units in the same convolutional layer. In such a network, the kernel weights are updated via the backpropagation…

asked Jan 17 '18 at 15:45

Andy R

413
1
4
8

13

votes

2 answers

When to use Stateful LSTM?

I'm trying to use LSTM on time-series data in order to generate future sequences that looks like the original sequences in term of values and progression direction. My approach is: train RNN to predict a value based on 25 past values then use the…

asked Jan 15 '18 at 20:51

Hastu

418
1
5
11

13

votes

2 answers

cosine_similarity returns matrix instead of single value

I am using below code to compute cosine similarity between the 2 vectors. It returns a matrix instead of a single value 0.8660254. [[ 1. 0.8660254] [ 0.8660254 1. ]] from sklearn.metrics.pairwise import cosine_similarity vec1 =…

asked Jan 15 '18 at 13:22

Olivia Brown

233
1
2
4

13

votes

2 answers

Ethically and Cost-effectively Scaling Data Scrapes

Few things in life give me pleasure like scraping structured and unstructured data from the Internet and making use of it in my models. For instance, the Data Science Toolkit (or RDSTK for R programmers) allows me to pull lots of good…

asked Dec 05 '14 at 15:51

Hack-R

1,919
1
21
34

13

votes

3 answers

How can autoencoders be used for clustering?

Suppose I have a set of time-domain signals with absolutely no labels. I want to cluster them in 2 or 3 classes. Autoencoders are unsupervised networks that learn to compress the inputs. So given an input $x^{(i)}$, weights $W_1$ and $W_2$, biases…

asked Dec 15 '17 at 18:08

Tendero

243
1
2
6

13

votes

4 answers

What is the difference between outlier detection and anomaly detection?

I would like to know the difference in terms of applications (e.g. which one is credit card fraud detection?) and in terms of used techniques. Example papers which define the task would be welcome.

asked Nov 15 '17 at 11:17

Martin Thoma

18,880
35
95
169

13

votes

1 answer

How to do stepwise regression using sklearn?

I could not find a way to stepwise regression in scikit learn. I have checked all other posts on Stack Exchange on this topic. Answers to all of them suggests using f_regression. But f_regression does not do stepwise regression but only give F-score…

asked Nov 06 '17 at 12:58

nlahri

131
1
1
3

13

votes

2 answers

Why using L1 regularization over L2?

Conducting a linear regression model using a loss function, why should I use $L_1$ instead of $L_2$ regularization? Is it better at preventing overfitting? Is it deterministic (so always a unique solution)? Is it better at feature selection (because…

asked Oct 12 '17 at 19:54

astudentofmaths

273
1
4
8

13

votes

5 answers

Clustering with cosine similarity

I have a large data set and a cosine similarity between them. I would like to cluster them using cosine similarity that puts similar objects together without needing to specify beforehand the number of clusters I expect. I read the sklearn…

asked Sep 05 '17 at 05:02

Smith Volka

665
2
6
13

13

votes

1 answer

Does nearest neighbour make any sense with t-SNE?

Answers on here have stated that the dimensions in t-SNE are meaningless, and that the distances between points are not a measure of similarity. However, can we say anything about a point based on it's nearest neighbours in t-SNE space? This answer…

tsne

asked Aug 16 '17 at 08:07

geometrikal

533
1
5
14

13

votes

2 answers

Sort numbers using only 2 hidden layers

I'm reading the cornerstone paper Sequence to Sequence Learning with Neural Networks by Ilya Sutskever and Quoc Le. On the first page, it briefly mentions that: A surprising example of the power of DNNs is their ability to sort N N-bit numbers…

deep-learning

asked Aug 13 '17 at 23:04

aerin

907
1
9
13

13

votes

3 answers

An Artificial Neural Network (ANN) with an arbitrary number of inputs and outputs

I would like to use ANNs for my problem, but the issue is my inputs and outputs node numbers are not fixed. I did some google searches before asking my question and found that the RNN may help me with my problem. But, all examples which I've found…

asked Jul 17 '17 at 09:24

Vadim

303
2
11

13

votes

3 answers

Does Batch Normalization make sense for a ReLU activation function?

Batch Normalization is described in this paper as a normalization of the input to an activation function with scale and shift variables $\gamma$ and $\beta$. This paper mainly describes using the sigmoid activation function, which makes sense.…

asked Jun 28 '17 at 01:11

bnorm

533
1
4
9

13

votes

4 answers

Algorithm for generating classification rules

So we have potential for a machine learning application that fits fairly neatly into the traditional problem domain solved by classifiers, i.e., we have a set of attributes describing an item and a "bucket" that they end up in. However, rather than…

asked May 22 '14 at 21:47

super_seabass

233
2
6

Most Popular