Questions tagged [deep-learning]

An area of machine learning concerned with learning hierarchical representations of the data, mainly done with deep neural networks.

1868 questions
42
votes
2 answers

Pooling vs. stride for downsampling

Pooling and stride both can be used to downsample the image. Let's say we have an image of 4x4, like below and a filter of 2x2. Then how do we decide whether to use (2x2 pooling) vs. (stride of 2)?
13
votes
1 answer

Gradient clipping when training deep neural networks

When would one want to perform gradient clipping when training a RNN or CNN? I'm especially interested in the latter. What would be a good starting value for clipping? (it can of course be tuned)
pir
  • 5,056
12
votes
2 answers

Predicting CPU and GPU memory requirements of DNN training

Say I have some deep learning model architecture, as well as a chosen mini-batch size. How do I derive from these the expected memory requirements for training that model? As an example, consider a (non-recurrent) model with input of dimension 1000,…
Whaa
  • 131
9
votes
2 answers

Too large batch size

I experiment with CIFA10 datasets. With my model I found that the larger the batch size, the better the model can learn the dataset. From what I see on the internet the typical size is 32 to 128, and my optimal size is 512-1024. Is it ok? Or are…
8
votes
1 answer

Is building deep learning architectures a trial and error scheme?

I have been reading many deep learning papers where each of them follow different architecture. I cannot see what the logical sense or the intuitive sense behind each layer in each architecture. I got a sense that many of those architectures are…
hbak
  • 455
5
votes
1 answer

what does "Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power" mean?

I was reading Rethinking the Inception Architecture for Computer Vision paper and in the very beginning I faced with the following part : Spatial aggregation can be done over lower dimensional embeddings without much or any loss in…
Hossein
  • 2,385
3
votes
1 answer

Feed Forward Layers - FC -> Relu -> FC, What the idea of using them

I saw in some papers (like “Attention is all you need”) a block called: “Feed Forward Layer” or “Feed Forward Network”. This is a simple block that contains FC -> Relu -> FC , and the main idea is the FC sizes, so for example: FC1 size is…
albert1905
  • 33
  • 4
3
votes
1 answer

is Kronecker (Dirac) delta function a valid kernel?

I came across a paper and it states that a Kronecker (Dirac) delta function is a valid kernel by defining the kernel as below: $k(x,z)=\boldsymbol{v}_x^T \cdot \boldsymbol{v}_z = \displaystyle\sum_{i=1}^{m} \boldsymbol{v}_x(i) \cdot…
Jasper
  • 31
  • 2
3
votes
2 answers

Paired inputs in deep neural network

Are there ways to handle paired inputs in the deep neural network (DNN)? First, I will describe my problem, and then I will describe an equivalent problem in the image. We have many protein sequences (each distinct but equal length), and we don't…
avi
  • 409
3
votes
1 answer

What does this passage mean from Youtube-8M paper

In Google's paper YouTube-8M: A Large-Scale Video Classification Benchmark in section 4.1.2 Deep Bag of Frame (DBoF) Pooling second paragraph, the first sentence says: The obtained sparse codes are fed into a pooling layer that aggregates the…
YellowPillow
  • 1,251
2
votes
2 answers

Why can't we use back propagation in "Hard attention" but we can use it in "RELU" function and max-pooling?

RELU, argmax function(in hard attention) and max-pooling are non-differentiable functions but We use back-propagation with RELU and max-pooling without any problems. What does make "Hard attention" different than them?
floyd
  • 1,372
  • 2
  • 17
  • 26
2
votes
0 answers

Practical guide to handle a changing dataset for deep learning?

I'm very confused on how to practically handle data sets for deep learning. If i want to use DL for some task i (usually) don't have all possible variations to train the network perfectly. Thus, given some task, someone usually starts by searching…
2
votes
1 answer

Do I need to remove duplicated image when building a CNN classification model?

I am building a CNN classification model. However, my data have some duplicated images. I am just wondering if it is acceptable to remove the image duplicates. If yes, what technique can I use for detect and remove duplicated them?
kha
  • 255
2
votes
1 answer

Deep Learning Book - deriving sigmoid units for Bernoulli output

In the paragraph before equation 6.20, the book says: "...If we begin with the assumption that the unnormalized log probabilities are linear in $y$ and $z$, we can exponentiate to obtain the unnormalized probabilities..." With this assumption, we…
foobar
  • 723
2
votes
1 answer

Is there any programmer-oriented site for deep learning?

I browse and search some deep learning sites, but their contents always contain many mathematical notations or theories. That is normal, I know, but I can't turn them into real knowledge well if I wish to implement some "Hello,world." examples. Is…
naive231
  • 101
  • 2
1
2 3