Highest Voted Questions - Data Science Stack Exchange

11

votes

2 answers

Why large weights are prohibited in neural networks?

Why weights with large values cause neural networks to be overfitted, and consequently we use approaches like regularization to neutralize weights with large values?

asked Sep 25 '17 at 11:28

Green Falcon

14,058
9
57
98

11

votes

3 answers

Field Aware Factorization Machines

Can anyone explain how field-aware factorization machines (FFM) compare to standard Factorization Machines (FM)? Standard: http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf "Field…

asked Oct 21 '14 at 00:09

B_Miner

702
1
7
20

11

votes

3 answers

Dynamic Time Warping is outdated?

At http://www.speech.zone/exercises/dtw-in-python/ it says Although it's not really used anymore, Dynamic Time Warping (DTW) is a nice introduction to the key concept of Dynamic Programming. I am using DTW for signal processing and are a little…

time-series

asked Aug 18 '17 at 09:22

Make42

752
2
8
18

11

votes

2 answers

How to perform Logistic Regression with a large number of features?

I have a dataset with 330 samples and 27 features for each sample, with a binary class problem for Logistic Regression. According to the "rule if ten" I need at least 10 events for each feature to be included. Though, I have an imbalanced dataset,…

asked Jul 28 '17 at 09:32

LucasRamos

111
1
1
3

11

votes

6 answers

Is Excel sufficient for data science?

I'm in the process of preparing to teach an introductory course on data science using the R programming language. My audience is undergraduate students majoring in business subjects. A typical business undergrad does not have any computer…

asked Jul 06 '17 at 10:10

I Like to Code

267
2
5

11

votes

1 answer

GPU Accelerated Data Processing for R in Windows

I'm currently taking a paper on Big Data which has us utilising R heavily for data analysis. I happen to have a GTX1070 in my pc for gaming reasons. Thus, I thought it would be really cool if I could use that to speed up some of the processing for…

asked Jun 05 '17 at 19:32

Jesse Maher

113
1
5

11

votes

1 answer

Lazy vs Eager Learning

I wish to better understand the difference between lazy and eager learning. I am having difficulty conceptualising what the "abstraction" refers to between the two. According to the text book I am reading it says, "The distinction between easy…

machine-learning

asked May 17 '17 at 20:01

TheGoat

271
1
2
6

11

votes

2 answers

R, keras: How to get output of a hidden layer?

I am using package Keras in R to do a neural network. How may I extract the output from a hidden layer? I found an example in python, but it is just I have no idea how to do that in R.

asked May 04 '17 at 18:53

user7117436

298
4
11

11

votes

3 answers

ReLU vs sigmoid in mnist example

PLEASE NOTE: I am not trying to improve on the following example. I know you can get over 99% accuracy. The whole code is in the question. When I tried this simple code I get around 95% accuracy, if I simply change the activation function from…

asked Apr 29 '17 at 00:38

user

1,993
6
21
38

11

votes

4 answers

Why not train the final model on the entire data after doing hyper-paramaeter tuning basis test data and model selection basis validation data?

By entire data I mean train + test + validation Once I have fixed my hyperparameter using the validation data, and choose the model using the test data, won't it be better to have a model trained on the entire data so that the parameters are better…

machine-learning

asked Apr 03 '17 at 08:56

Apoorva Abhishekh

195
1
3
8

11

votes

4 answers

Will cross validation performance be an accurate indication for predicting the true performance on an independent data set?

I feel that this question is related to the theory behind cross-validation. I present my empirical finding here and wrote a question related to the theory of cross-validation at there. I have two models M1 and M2, I use the same data set to train…

asked Mar 22 '17 at 04:04

KevinKim

635
1
7
13

11

votes

3 answers

Recurrent (CNN) model on EEG data

I'm wondering how to interpret a recurrent architecture in an EEG context. Specifically I'm thinking of this as a Recurrent CNN (as opposed to architectures like LSTM), but maybe it applies to other types of recurrent networks as well When I read…

asked Feb 23 '17 at 11:15

Simon

1,071
2
10
28

11

votes

1 answer

How does sigmoid saturate with large weights?

In cs231n course , it is mentioned that If the initial weights are too large then most neurons would become saturated and the network will barely learn. How do the neurons get saturated? Large weights may lead to a z (output of saturation) which…

asked Feb 12 '17 at 10:37

MysticForce

213
1
2
6

11

votes

2 answers

Why is learning rate causing my neural network's weights to skyrocket?

I am using tensorflow to write simple neural networks for a bit of research and I have had many problems with 'nan' weights while training. I tried many different solutions like changing the optimizer, changing the loss, the data size, etc. but with…

asked Dec 27 '16 at 22:50

abeoliver

113
1
6

11

votes

1 answer

Why TensorFlow can't fit simple linear model if I am minimizing absolute mean error instead of the mean squared error?

In Introduction I have just changed loss = tf.reduce_mean(tf.square(y - y_data)) to loss = tf.reduce_mean(tf.abs(y - y_data)) and model is unable to learn the loss just became bigger with time. Why?

asked Nov 17 '16 at 13:10

Brans Ds

849
1
8
18

Most Popular