Most Popular
1500 questions
11
votes
1 answer
Keras LSTM with 1D time series
I'm learning how to use Keras and I've had reasonable success with my labelled dataset using the examples on Chollet's Deep Learning for Python. The data set is ~1000 Time Series with length 3125 with 3 potential classes.
I'd like to go beyond the…
user1147964
- 165
- 1
- 3
- 8
11
votes
2 answers
How do attention mechanisms in RNNs learn weights for a variable length input
Attention mechanisms in RNNs are reasonably common to sequence to sequence models.
I understand that the decoder learns a weight vector $\alpha$ which is applied as a weighted sum of the output vectors from the encoder network. This is used to…
davidparks21
- 423
- 4
- 17
11
votes
3 answers
What is a policy in machine learning?
While I was reading the paper "Grounded Action Transformation for Robot Learning in Simulation", I came across the term "policy". Could someone explain to me what that actually is (in general and in the particular context of the paper)?
Ramya Raj
- 111
- 1
- 4
11
votes
2 answers
Trying to use TensorFlow to predict financial time series data
I'm new to ML and TensorFlow (I started about a few hours ago), and I'm trying to use it to predict the next few data points in a time series. I'm taking my input and doing this with it:
/----------- x…
Isvara
- 211
- 2
- 5
11
votes
1 answer
Data preprocessing: Should we normalise images pixel-wise?
Let me present you with a toy example and a reasoning on image normalisation I had:
Suppose we have a CNN architecture to classify NxN grayscale images in two categories. Pixel values range from 0 (black) to 255 (white).
Class 0:
Images that…
lucasrodesg
- 235
- 2
- 7
11
votes
2 answers
Difference between RMSProp with momentum and Adam Optimizers
According to this scintillating blogpost Adam is very similar to RMSProp with momentum.
From tensorflow documentation we see that tf.train.RMSPropOptimizer has following parameters
__init__(
learning_rate,
decay=0.9,
momentum=0.0,
…
hans
- 253
- 1
- 3
- 9
11
votes
1 answer
What are "VGG54" and "VGG22" derived from the VGG19 CNN?
In the paper Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network by Christian Ledig et al., the distance between images (used in the loss function) is calculated from feature maps extracted from the VGG19 network.…
Lafayette
- 604
- 5
- 19
11
votes
2 answers
Find optimal P(X|Y) given I have a model that has good performance when trained on P(Y|X)
Input Data:
$X$ -> features of t shirt (colour,logo,etc)
$Y$ -> profit margin
I have trained a random forest on the above $X$ and $Y$ and have achieved reasonable accuracy on a test data.
So, I have
$P(Y|X)$.
Now, I would like to find $P(X|Y)$ i.e…
claudius
- 153
- 8
11
votes
1 answer
Difference between interpolate() and fillna() in pandas
Since interpolate and fillna method does the same work of filling na values. What is the basic difference between the two. What is the significance of having these two different methods?? Can anyone explain me in layman terms. I already visited…
Debuggerrr
- 215
- 1
- 2
- 8
11
votes
2 answers
Consequence of Feature Scaling
I am currently using SVM and scaling my training features to the range of [0,1].
I first fit/transform my training set and then apply the same transformation to my testing set. For example:
### Configure transformation and apply to training set
…
mike1886
- 933
- 9
- 17
11
votes
3 answers
What regression to use to calculate the result of election in a multiparty system?
I want to make a prediction for the result of the parliamentary elections. My output will be the % each party receives. There is more than 2 parties so logistic regression is not a viable option. I could make a separate regression for each party but…
Viktor
- 850
- 1
- 6
- 17
11
votes
2 answers
When do we say that the dataset is not classifiable?
I have many times analysed a dataset on which I could not really do any sort of classification. To see whether I can get a classifier I have usually used the following steps:
Generate box plots of label against numerical values.
Reduce the…
vc_dim
- 188
- 9
11
votes
4 answers
Using Clustering in text processing
Hi this is my first question in the Data Science stack. I want to create an algorithm for text classification. Suppose i have a large set of text and articles. Lets say around 5000 plain texts. I first use a simple function to determine the…
Rashid
- 213
- 1
- 4
11
votes
3 answers
Relationship between KS, AUROC, and Gini
Common model validation statistics like the Kolmogorov–Smirnov test (KS), AUROC, and Gini coefficient are all functionally related. However, my question has to do with proving how these are all related. I am curious if anyone can help me prove these…
Steven
- 111
- 1
- 1
- 3
11
votes
3 answers
Can GPS coordinates (latitude and longitude) be used as features in a linear model?
I have data sets that contain, among many features, GPS coordinates (latitude and longitude). I'd like to use these data sets to explore problems such as: (1) computing ETA to drive between start and end points; and (2) estimating the amount of…
stackoverflowuser2010
- 261
- 1
- 2
- 7