It's quite easy to see and doesn't need sophisticated math. In the maximum likelihood (MLE) you maximize
$$
\underset{\theta}{\operatorname{arg\,max}} \; p(X | \theta)
$$
while with maximum a posterior (MAP) you also consider the prior for $\theta$
$$
\underset{\theta}{\operatorname{arg\,max}} \; p(X | \theta) \, p(\theta)
$$
Now the statement about uniform prior is not exactly true. It is true if you consider a flat, improper (that doesn't integrate to 1), prior $p(\theta) \propto 1$ (it is equal to 1 for any value of $\theta$), then it yields
$$
\underset{\theta}{\operatorname{arg\,max}} \; p(X | \theta) \times 1
$$
and is the same as maximum likelihood. But imagine that the prior is uniform over a bounded region, say
$$
p(\theta) = \begin{cases}
1 & \theta \in (100, 101), \\[6pt]
0 & \text {otherwise}.
\end{cases}
$$
then MLE and MAP would be equal only if MLE returned a result between 100 and 101 because in other cases, the posterior in MAP would be equal to zero.
As for Gaussian prior, or any other, if you multiply likelihood by anything else than a flat prior, you are maximizing a different function. If you have a lot of data, prior becomes less relevant, so it can happen that MAP and MLE will lead to similar results, but this does not need to happen.