0

I've have read that cnn have neuron per pixel but also read that it is not true. so what is the actual answer? and what I know is cnn tries to adjust the weight matrix which is also a kernel matrix, i might have been wrong about this so don't judge me, then how can we have neuron per pixel? if we have neuron per pixel then we should weight matrix equal to pixel dimension???

Can anybody explain me inner working of cnn with dimension and shape with the help of tensor?

Dave
  • 62,186

2 Answers2

1

A CNN (strictly, a convolutional layer in a neural network) often has a neuron for each pixel. However, it doesn't have an independently-estimated set of weights on each neuron; the weights are constrained to be the same across all the neurons in a layer, and lots of them are constrained to be zero

That is, the output for a pixel $j$ is still $\sigma(b_j+\sum_i w_{ij} x_i)$ where $x_i$ is the input for pixel $i$, $w_{ij}$ is the weight for input pixel $i$, and $b$ is the bias, but $w_{ij}$ is defined in terms of the relative positions of $i$ and $j$. If pixels $i$ and $j$ are close, $w_{ij}$ gets estimated; if they are not close $w_{ij}$ is just set to zero. 'Close' in this context might mean 'adjacent' or it might mean in the same small patch; the 'AlexNet' CNN that made CNNs famous used $11\times 11$ patches.

On top of this, the weights $w_{ij}$ that do get estimated, the ones for 'close' points $j$, are constrained to be the same for each $i$. That is, $w_{ii}$ will be the same for all $i$, and $w_{i,\text{the point just left of i}}$ will be the same for all $i$, and $w_{i,\text{the point two left and one up from $i$}}$ will be the same for all $i$. This constraint is what's usually written in terms of a convolutional filter, but you can think of it as just a constraint on estimating the parameters.

As a result, while you have a neuron per pixel, you only have a handful of weights for the whole layer.

And finally, you don't always have a neuron per pixel; sometimes you have one for every few pixels in a spaced-out grid.

Thomas Lumley
  • 38,062
  • what does output of i and j pixel separately mean? don't they are used in combine eg:- horizontal distance i and vertical distance j in combine gives one pixel position? – Nervous Hero Mar 11 '22 at 15:34
  • No, $i$ and $j$ indicate individual pixel positions. In 2d, $i$ would actually be $(i_x, i_y)$ coordinates and $j$ would be $(j_x, j_j)$ coordinates and weights with the same value of $(i_x-j_x, i_y-j_y)$ would be constrained to be the same. – Thomas Lumley Mar 12 '22 at 02:06
  • that means i and j in your examples are pixel from two different features, am i correct? – Nervous Hero Mar 12 '22 at 11:08
  • Yes. There's a pixel for each neuron (or not), but each pixel takes inputs from more than one pixel – Thomas Lumley Mar 13 '22 at 18:43
1

Not sure where you have read, but CNN will not have neuron per pixel. That is the very advantage of CNN compared to a fully connected neural network. Because of this only no. of weights will reduce, thus reducing the no. of computations in CNN compared to using fully connected neural network for solving the same problem.

Example:

  • Input image to CNN - 224x224x3
  • Weights - 5x5x3x4 (4 kernels of size 5x5x3)
  • Output size - 224x224x4 (with padding and stride 1,no pooling)
  • Output size - 112x112x4 (with padding and stride 1 and pooling of 2x2)

enter image description here

The image shows how 2D convolution works only for 1 channel and 1 dimension of the weights kernel. Similar thing happens for all channels. This article has more intuitions. Convolution Intuition

If you see, roughly the same weights are used for all the pixels.