So to my understanding, Dense is pretty much Keras's way to say matrix multiplication.
SUMMARY:
Whenever we say Dense(512, activation='relu', input_shape=(32, 32, 3)), what we are really saying is Perform matrix multiplication to result in an output matrix with a desired last dimension to be 512.
What gets lost in translation is that the 512 is just ONE part of the desired output, not the whole picture. Keras sees the input shape and the Dense shape and automagically figures out that you want to perform the matrix multiplication.
EXAMPLE 1:
Let's look at Dense(512, activation='relu', input_shape=(32, 32, 3)).
Matrix multiplication:
(None, 32, 32, 3) * (3, 512)
EXPLANATION:
None is the number of pictures determined at model training, so it doesn't matter right now.
(..., 32, 32, 3) is the input_shape specified in the Dense(...)
(3, 512) comes from Keras seeing that you have the last dimension as a (..., ..., ..., 3) as your input_shape. So Keras takes that last 3 and combines that with the 512 to result in the final shape of (3, 512). Taa-daa, automagic explained.
Results in:
(None, 32, 32, 512)
This is because those two 3s cancel each other out because of the matrix multiplication.
The Param # comes from (3 * 512) + 512 = 2048 as pointed out by grovina's answer. This is because of this equation:
input * weights + bias
input would be the 3 (aka number of params per neuron)
weights would be the 512 (aka number of neurons)
bias would be the 512 (aka one bias per neuron)
EXAMPLE 2:
Let's do the same with the Dense(10, activation='softmax').
Matrix multiplication:
(None, 32, 32, 512) * (512, 10)
EXPLANATION:
None doesn't matter right now.
(..., 32, 32, 512) is the input_shape that comes from the first Dense(...)
(512, 10) comes from the last dimension of the input_shape and the 10 specified in the second Dense(...)
Results in:
(None, 32, 32, 10)
The two 512s cancel out.
Param # is (512 * 10) + 10 = 5130
input would be the 512 (aka number of params per neuron)
weights would be the 10 (aka number of neurons)
bias would be the 10 (aka one bias per neuron)