Most Popular
1500 questions
11
votes
2 answers
Why large weights are prohibited in neural networks?
Why weights with large values cause neural networks to be overfitted, and consequently we use approaches like regularization to neutralize weights with large values?
Green Falcon
- 14,058
- 9
- 57
- 98
11
votes
3 answers
Field Aware Factorization Machines
Can anyone explain how field-aware factorization machines (FFM) compare to standard Factorization Machines (FM)?
Standard:
http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf
"Field…
B_Miner
- 702
- 1
- 7
- 20
11
votes
3 answers
Dynamic Time Warping is outdated?
At http://www.speech.zone/exercises/dtw-in-python/ it says
Although it's not really used anymore, Dynamic Time Warping (DTW) is a
nice introduction to the key concept of Dynamic Programming.
I am using DTW for signal processing and are a little…
Make42
- 752
- 2
- 8
- 18
11
votes
2 answers
How to perform Logistic Regression with a large number of features?
I have a dataset with 330 samples and 27 features for each sample, with a binary class problem for Logistic Regression.
According to the "rule if ten" I need at least 10 events for each feature to be included. Though, I have an imbalanced dataset,…
LucasRamos
- 111
- 1
- 1
- 3
11
votes
6 answers
Is Excel sufficient for data science?
I'm in the process of preparing to teach
an introductory course on data science using the R programming language.
My audience is undergraduate students majoring in business subjects.
A typical business undergrad does not have any computer…
I Like to Code
- 267
- 2
- 5
11
votes
1 answer
GPU Accelerated Data Processing for R in Windows
I'm currently taking a paper on Big Data which has us utilising R heavily for data analysis. I happen to have a GTX1070 in my pc for gaming reasons. Thus, I thought it would be really cool if I could use that to speed up some of the processing for…
Jesse Maher
- 113
- 1
- 5
11
votes
1 answer
Lazy vs Eager Learning
I wish to better understand the difference between lazy and eager learning. I am having difficulty conceptualising what the "abstraction" refers to between the two.
According to the text book I am reading it says, "The distinction between easy…
TheGoat
- 271
- 1
- 2
- 6
11
votes
2 answers
R, keras: How to get output of a hidden layer?
I am using package Keras in R to do a neural network. How may I extract the output from a hidden layer? I found an example in python, but it is just I have no idea how to do that in R.
user7117436
- 298
- 4
- 11
11
votes
3 answers
ReLU vs sigmoid in mnist example
PLEASE NOTE: I am not trying to improve on the following example. I know you can get over 99% accuracy. The whole code is in the question. When I tried this simple code I get around 95% accuracy, if I simply change the activation function from…
user
- 1,993
- 6
- 21
- 38
11
votes
4 answers
Why not train the final model on the entire data after doing hyper-paramaeter tuning basis test data and model selection basis validation data?
By entire data I mean train + test + validation
Once I have fixed my hyperparameter using the validation data, and choose the model using the test data, won't it be better to have a model trained on the entire data so that the parameters are better…
Apoorva Abhishekh
- 195
- 1
- 3
- 8
11
votes
4 answers
Will cross validation performance be an accurate indication for predicting the true performance on an independent data set?
I feel that this question is related to the theory behind cross-validation. I present my empirical finding here and wrote a question related to the theory of cross-validation at there.
I have two models M1 and M2, I use the same data set to train…
KevinKim
- 635
- 1
- 7
- 13
11
votes
3 answers
Recurrent (CNN) model on EEG data
I'm wondering how to interpret a recurrent architecture in an EEG context. Specifically I'm thinking of this as a Recurrent CNN (as opposed to architectures like LSTM), but maybe it applies to other types of recurrent networks as well
When I read…
Simon
- 1,071
- 2
- 10
- 28
11
votes
1 answer
How does sigmoid saturate with large weights?
In cs231n course , it is mentioned that
If the initial weights are too large then most neurons would become
saturated and the network will barely learn.
How do the neurons get saturated? Large weights may lead to a z (output of saturation) which…
MysticForce
- 213
- 1
- 2
- 6
11
votes
2 answers
Why is learning rate causing my neural network's weights to skyrocket?
I am using tensorflow to write simple neural networks for a bit of research and I have had many problems with 'nan' weights while training. I tried many different solutions like changing the optimizer, changing the loss, the data size, etc. but with…
abeoliver
- 113
- 1
- 6
11
votes
1 answer
Why TensorFlow can't fit simple linear model if I am minimizing absolute mean error instead of the mean squared error?
In Introduction I have just changed
loss = tf.reduce_mean(tf.square(y - y_data))
to
loss = tf.reduce_mean(tf.abs(y - y_data))
and model is unable to learn the loss just became bigger with time. Why?
Brans Ds
- 849
- 1
- 8
- 18