An area of machine learning concerned with learning hierarchical representations of the data, mainly done with deep neural networks.
Questions tagged [deep-learning]
1868 questions
42
votes
2 answers
Pooling vs. stride for downsampling
Pooling and stride both can be used to downsample the image.
Let's say we have an image of 4x4, like below
and a filter of 2x2.
Then how do we decide whether to use (2x2 pooling) vs. (stride of 2)?
JungIn Choi
- 541
13
votes
1 answer
Gradient clipping when training deep neural networks
When would one want to perform gradient clipping when training a RNN or CNN? I'm especially interested in the latter. What would be a good starting value for clipping? (it can of course be tuned)
pir
- 5,056
12
votes
2 answers
Predicting CPU and GPU memory requirements of DNN training
Say I have some deep learning model architecture, as well as a chosen mini-batch size. How do I derive from these the expected memory requirements for training that model?
As an example, consider a (non-recurrent) model with input of dimension 1000,…
Whaa
- 131
9
votes
2 answers
Too large batch size
I experiment with CIFA10 datasets. With my model I found that the larger the batch size, the better the model can learn the dataset. From what I see on the internet the typical size is 32 to 128, and my optimal size is 512-1024. Is it ok? Or are…
Konstantin Solomatov
- 1,635
8
votes
1 answer
Is building deep learning architectures a trial and error scheme?
I have been reading many deep learning papers where each of them follow different architecture. I cannot see what the logical sense or the intuitive sense behind each layer in each architecture. I got a sense that many of those architectures are…
hbak
- 455
5
votes
1 answer
what does "Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power" mean?
I was reading Rethinking the Inception Architecture for Computer Vision paper and in the very beginning I faced with the following part :
Spatial aggregation can be done over lower dimensional embeddings without much or any loss in…
Hossein
- 2,385
3
votes
1 answer
Feed Forward Layers - FC -> Relu -> FC, What the idea of using them
I saw in some papers (like “Attention is all you need”) a block called: “Feed Forward Layer” or “Feed Forward Network”.
This is a simple block that contains FC -> Relu -> FC , and the main idea is the FC sizes, so for example: FC1 size is…
albert1905
- 33
- 4
3
votes
1 answer
is Kronecker (Dirac) delta function a valid kernel?
I came across a paper and it states that a Kronecker (Dirac) delta function is a valid kernel by defining the kernel as below:
$k(x,z)=\boldsymbol{v}_x^T \cdot \boldsymbol{v}_z = \displaystyle\sum_{i=1}^{m} \boldsymbol{v}_x(i) \cdot…
Jasper
- 31
- 2
3
votes
2 answers
Paired inputs in deep neural network
Are there ways to handle paired inputs in the deep neural network (DNN)?
First, I will describe my problem, and then I will describe an equivalent problem in the image.
We have many protein sequences (each distinct but equal length), and we don't…
avi
- 409
3
votes
1 answer
What does this passage mean from Youtube-8M paper
In Google's paper YouTube-8M: A Large-Scale Video Classification
Benchmark in section 4.1.2 Deep Bag of Frame (DBoF) Pooling second paragraph, the first sentence says:
The obtained sparse codes are fed into a pooling layer that aggregates
the…
YellowPillow
- 1,251
2
votes
2 answers
Why can't we use back propagation in "Hard attention" but we can use it in "RELU" function and max-pooling?
RELU, argmax function(in hard attention) and max-pooling are non-differentiable functions but We use back-propagation with RELU and max-pooling without any problems. What does make "Hard attention" different than them?
floyd
- 1,372
- 2
- 17
- 26
2
votes
0 answers
Practical guide to handle a changing dataset for deep learning?
I'm very confused on how to practically handle data sets for deep learning. If i want to use DL for some task i (usually) don't have all possible variations to train the network perfectly. Thus, given some task, someone usually starts by searching…
John Doe
- 71
2
votes
1 answer
Do I need to remove duplicated image when building a CNN classification model?
I am building a CNN classification model. However, my data have some duplicated images. I am just wondering if it is acceptable to remove the image duplicates. If yes, what technique can I use for detect and remove duplicated them?
kha
- 255
2
votes
1 answer
Deep Learning Book - deriving sigmoid units for Bernoulli output
In the paragraph before equation 6.20, the book says:
"...If we begin with the assumption that the unnormalized log probabilities are linear in $y$ and $z$, we can exponentiate to obtain the unnormalized probabilities..."
With this assumption, we…
foobar
- 723
2
votes
1 answer
Is there any programmer-oriented site for deep learning?
I browse and search some deep learning sites, but their contents always contain many mathematical notations or theories. That is normal, I know, but I can't turn them into real knowledge well if I wish to implement some "Hello,world." examples.
Is…
naive231
- 101
- 2