0

When we use deep neural networks (DNNs) to solve a 1-dimention regression problem, we can approximate data distribution with the output of a DNN like the picture below.
My question is that DNN does not have the assumption of gaussian distribution or any other distribution of itself. It just knows what value to output when it sees an input. So how do you know the probability distribution of the DNN? For example, if someone asks, what is the probability of the point appearing in (5, 0). Can DNN answer this kind of questions?

enter image description here (pic from https://medium.com/@sunnerli/dnn-regression-in-tensorflow-16cc22cdd577)

Lion Lai
  • 135

1 Answers1

2

For many regression algorithms, not only neural networks, the model is that the data is distributed by $y \sim \mathcal{N}(f(x;\theta), \sigma^2)$, where $\theta$ are the model parameters and $\sigma^2$ is the variance of the distribution (often a hyperparameter).

Maximizing the log-likelihood of the data with respect to $\theta$ is equivalent to minimizing the mean squared error loss between the $y_i$ and $f(x_i;\theta)$.

Therefore, to compute the probability density of $(5,0)$, you would just find the density of a gaussian with mean $f(5; \theta)$ and a variance of $\sigma^2$, where $f$ is your neural network.

shimao
  • 26,092
  • Thanks for your answer. But I still have two questions. 1: Does apply DNN to a regression problem also must have the assumption that the data has to be gaussion distributed? As I know, we just care about mean square error (MSE) of difference of output value and ground truth value. There is no gaussian distribution involved. 2. How do I find out a DNN's multiple means and variances from its weights and biases only? Is this even possible? – Lion Lai Jan 30 '18 at 04:09
  • Using MSE (L2) loss corresponds to the data being distributed normally. Using L1 loss corresponds to data being distributed according to the laplacian distribution. In general, there is a mapping between loss functions and probability distributions. 2. Not sure what you mean by a DNN's multiple means and variances.
  • – shimao Jan 30 '18 at 04:11
  • Are you refering to regularization? Can you add references of them? 2. After traing a DNN model, all we have are network's weights and biases. How can I calculate the density funtion from these numbers. Thank you.
  • – Lion Lai Jan 30 '18 at 04:18
  • No, I am referring to the prediction loss. L1 and L2 are just mathematical functions which can be applied to either the difference between prediction and ground truth $y-f(x)$, or also to the model parameters $\theta$. In this question only the former is relevant. 2. Given input $x$, you feed it through the network to produce $f(x)$. The density function is a gaussian distribution centered on $f(x)$ with variance $\sigma^2$.
  • – shimao Jan 30 '18 at 04:24