I am trying to understand how a nn.conv1d processes an input for a specific example related to audio processing in a WaveNet model.
I have input data of shape (1,1,8820), which passes through an input layer (1,16,1), to output a shape of (1,16,8820).
That part I understand, because you can just multiply the two matrices. The next layer is a conv1d, kernel size=3, input channels=16, output channels=16, so the state dict shows a matrix with shape (16,16,3) for the weights. When the input of (1,16,8820) goes through that layer, the result is another (1,16,8820).
What multiplication steps occur within the layer to apply the weights to the audio data? In other words, if I wanted to apply the layer(forward calculations only) using only the input matrix, the state_dict matrix, and numpy, how would I do that?
This example is using the nn.conv1d layer from Pytorch. Also, if the same layer had a dilation=2, how would that change the operations?