Highest Voted Questions - Data Science Stack Exchange

15

votes

4 answers

How to calculate the output shape of conv2d_transpose?

Currently I code a GAN to generate MNIST numbers but the generator doesnt want to work. First I choose z with shape 100 per Batch, put into a layer to get into the shape (7,7, 256). Then conv2d_transpose layer to into 28, 28, 1. (which is basically…

asked Jan 09 '18 at 19:14

snowparrot

305
1
3
9

15

votes

1 answer

PyTorch vs. Tensorflow eager

Google recently included in tensorflow's nightly builds its Eager mode, an imperative API to access tensorflow computation capabilities. How do tensorflow eager compare to PyTorch? Some aspects that could affect the comparison could be: Advantages…

asked Nov 07 '17 at 17:12

noe

26,410
1
46
76

15

votes

2 answers

Why do we need to handle data imbalance?

I would like to know why we need to deal with data imbalance. I know how to deal with it and different methods to solve the issue - by up sampling or down sampling or by using SMOTE. For example, if I have a rare disease 1 percent out of 100, and…

asked Nov 06 '17 at 06:15

sara

481
7
15

15

votes

1 answer

Why should I normalize also the output data?

I'm new to data science and Neural Networks in general. Looking around, many people say it is better to normalize the data before doing anything with the NN. I understand how normalizing the input data can be useful. However, I really don't see how…

asked Oct 31 '17 at 09:39

Euler_Salter

323
1
2
7

15

votes

3 answers

Modelling Unevenly Spaced Time Series

I have a continuous variable, sampled over a period of a year at irregular intervals. Some days have more than one observation per hour, while other periods have nothing for days. This makes it particularly difficult to detect patterns in the time…

asked Nov 03 '14 at 16:51

doublebyte

420
3
9

15

votes

3 answers

Improve Pandas dataframe filtering speed

I have a dataset with 19 columns and about 250k rows. I have worked with bigger datasets, but this time, Pandas decided to play with my nerves. I tried to split the original dataset into 3 sub-dataframes based on some simple rules. However, it takes…

asked Sep 24 '17 at 10:50

Tasos

3,920
4
23
54

15

votes

1 answer

On-line random forests by adding more single Decisions Trees

A Random Forest (RF) is created by an ensemble of Decision Trees's (DT). By using bagging, each DT is trained in a different data subset. Hence, is there any way of implementing an on-line random forest by adding more decision tress on new data? For…

asked Oct 20 '14 at 08:48

tashuhka

566
5
10

15

votes

2 answers

Why should the initialization of weights and bias be chosen around 0?

I read this: To train our neural network, we will initialize each parameter W(l)ijWij(l) and each b(l)ibi(l) to a small random value near zero (say according to a Normal(0,ϵ2)Normal(0,ϵ2) distribution for some small ϵϵ, say 0.01) from Stanford…

asked Aug 09 '17 at 07:30

cinqS

367
1
2
13

15

votes

1 answer

Make Keras run on multi-machine multi-core cpu system

I'm working on Seq2Seq model using LSTM from Keras (using Theano background) and I would like to parallelize the processes, because even few MBs of data need several hours for training. It is clear that GPUs are far much better in parallelization…

asked Aug 08 '17 at 09:02

chmodsss

1,964
2
18
37

15

votes

4 answers

Can we generate huge dataset with Generative Adversarial Networks

I'm dealing with a problem where I couldn't find enough dataset(images) to feed into my deep neural network for training. I was so inspired by the paper Generative Adversarial Text to Image Synthesis published by Scott Reed et al. on Generative…

asked Apr 04 '17 at 11:26

Alwyn Mathew

305
4
10

15

votes

1 answer

What is a 1D Convolutional Layer in Deep Learning?

I have a good general understanding of the role and mechanism of convolutional layers in Deep Learning for image processing in case of 2D or 3D implementations - they "simply" try to catch 2D patterns in images (in 3 channels in case of 3D). But…

asked Feb 28 '17 at 08:12

Hendrik

8,587
17
42
55

15

votes

4 answers

How to specify important attributes?

Assume a set of loosely structured data (e.g. Web tables/Linked Open Data), composed of many data sources. There is no common schema followed by the data and each source can use synonym attributes to describe the values (e.g. "nationality" vs…

asked May 19 '14 at 15:55

vefthym

503
1
6
13

15

votes

1 answer

How many LSTM cells should I use?

Are there any rules of thumb (or actual rules) pertaining to the minimum, maximum and "reasonable" amount of LSTM cells I should use? Specifically I am relating to BasicLSTMCell from TensorFlow and num_units property. Please assume that I have a…

rnn

asked Jan 16 '17 at 17:53

user27994

15

votes

1 answer

Is stratified sampling necessary (random forest, Python)?

I use Python to run a random forest model on my imbalanced dataset (the target variable was a binary class). When splitting the training and testing dataset, I struggled whether to used stratified sampling (like the code shown) or not. So far, I…

asked Jan 12 '17 at 00:58

LUSAQX

783
2
10
24

15

votes

2 answers

In XGBoost would we evaluate results with a Precision Recall curve vs ROC?

I am using XGBoost for payment fraud detection. The objective is binary classification, and the data is very unbalanced. One out of every 3-4k transactions is fraud. I would expect the best way to evaluate the results is a Precision-Recall (PR)…

xgboost

asked Jan 10 '17 at 16:29

davidjhp

435
1
4
10

Most Popular