Highest Voted 'deep-learning' Questions - Statistical Analysis Stack Exchange

42

votes

2 answers

Pooling vs. stride for downsampling

Pooling and stride both can be used to downsample the image. Let's say we have an image of 4x4, like below and a filter of 2x2. Then how do we decide whether to use (2x2 pooling) vs. (stride of 2)?

deep-learning

asked Jan 16 '19 at 07:53

JungIn Choi

541

13

votes

1 answer

Gradient clipping when training deep neural networks

When would one want to perform gradient clipping when training a RNN or CNN? I'm especially interested in the latter. What would be a good starting value for clipping? (it can of course be tuned)

deep-learning

asked Jul 09 '15 at 16:54

pir

5,056

12

votes

2 answers

Predicting CPU and GPU memory requirements of DNN training

Say I have some deep learning model architecture, as well as a chosen mini-batch size. How do I derive from these the expected memory requirements for training that model? As an example, consider a (non-recurrent) model with input of dimension 1000,…

deep-learning

asked Jul 17 '16 at 12:00

Whaa

131

9

votes

2 answers

Too large batch size

I experiment with CIFA10 datasets. With my model I found that the larger the batch size, the better the model can learn the dataset. From what I see on the internet the typical size is 32 to 128, and my optimal size is 512-1024. Is it ok? Or are…

deep-learning

asked Apr 30 '17 at 23:48

Konstantin Solomatov

1,635

8

votes

1 answer

Is building deep learning architectures a trial and error scheme?

I have been reading many deep learning papers where each of them follow different architecture. I cannot see what the logical sense or the intuitive sense behind each layer in each architecture. I got a sense that many of those architectures are…

deep-learning

asked Sep 24 '17 at 15:42

hbak

455

5

votes

1 answer

what does "Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power" mean?

I was reading Rethinking the Inception Architecture for Computer Vision paper and in the very beginning I faced with the following part : Spatial aggregation can be done over lower dimensional embeddings without much or any loss in…

deep-learning

asked Dec 16 '17 at 17:43

Hossein

2,385

3

votes

1 answer

Feed Forward Layers - FC -> Relu -> FC, What the idea of using them

I saw in some papers (like “Attention is all you need”) a block called: “Feed Forward Layer” or “Feed Forward Network”. This is a simple block that contains FC -> Relu -> FC , and the main idea is the FC sizes, so for example: FC1 size is…

deep-learning

asked Sep 12 '18 at 07:08

albert1905

33
4

3

votes

1 answer

is Kronecker (Dirac) delta function a valid kernel?

I came across a paper and it states that a Kronecker (Dirac) delta function is a valid kernel by defining the kernel as below: $k(x,z)=\boldsymbol{v}_x^T \cdot \boldsymbol{v}_z = \displaystyle\sum_{i=1}^{m} \boldsymbol{v}_x(i) \cdot…

deep-learning

asked Sep 05 '18 at 08:17

Jasper

31
2

3

votes

2 answers

Paired inputs in deep neural network

Are there ways to handle paired inputs in the deep neural network (DNN)? First, I will describe my problem, and then I will describe an equivalent problem in the image. We have many protein sequences (each distinct but equal length), and we don't…

deep-learning

asked Feb 16 '18 at 17:55

avi

409

3

votes

1 answer

What does this passage mean from Youtube-8M paper

In Google's paper YouTube-8M: A Large-Scale Video Classification Benchmark in section 4.1.2 Deep Bag of Frame (DBoF) Pooling second paragraph, the first sentence says: The obtained sparse codes are fed into a pooling layer that aggregates the…

deep-learning

asked Mar 29 '17 at 10:41

YellowPillow

1,251

2

votes

2 answers

Why can't we use back propagation in "Hard attention" but we can use it in "RELU" function and max-pooling?

RELU, argmax function(in hard attention) and max-pooling are non-differentiable functions but We use back-propagation with RELU and max-pooling without any problems. What does make "Hard attention" different than them?

deep-learning

asked Jan 10 '19 at 12:46

floyd

1,372
2
17
26

2

votes

0 answers

Practical guide to handle a changing dataset for deep learning?

I'm very confused on how to practically handle data sets for deep learning. If i want to use DL for some task i (usually) don't have all possible variations to train the network perfectly. Thus, given some task, someone usually starts by searching…

deep-learning

asked Sep 01 '18 at 12:57

John Doe

71

2

votes

1 answer

Do I need to remove duplicated image when building a CNN classification model?

I am building a CNN classification model. However, my data have some duplicated images. I am just wondering if it is acceptable to remove the image duplicates. If yes, what technique can I use for detect and remove duplicated them?

deep-learning

asked Jun 12 '17 at 07:02

kha

255

2

votes

1 answer

Deep Learning Book - deriving sigmoid units for Bernoulli output

In the paragraph before equation 6.20, the book says: "...If we begin with the assumption that the unnormalized log probabilities are linear in $y$ and $z$, we can exponentiate to obtain the unnormalized probabilities..." With this assumption, we…

deep-learning

asked Feb 08 '17 at 05:04

foobar

723

2

votes

1 answer

Is there any programmer-oriented site for deep learning?

I browse and search some deep learning sites, but their contents always contain many mathematical notations or theories. That is normal, I know, but I can't turn them into real knowledge well if I wish to implement some "Hello,world." examples. Is…

deep-learning

asked Oct 01 '16 at 03:55

naive231

101
2

Questions tagged [deep-learning]