3

I am reading a slide which says GLM are neural networks. https://owars.info/mario/2020_Wuthrich.pdf

The authors showed that if you have an exponential function as the activation function of a very simple 1-layer neural network, then it corresponds to Poisson regression, which is a type of GLM.

My question is: for the general GLM, what should be the corresponding activation function if the GLM were represented as a neural network?

Fraïssé
  • 1,540
  • 2
    the key is to have only one layer; different activations would correspond to different "link functions" in GLM parlance; there is no single activation function which makes a neural net into a GLM. – John Madden Oct 04 '23 at 19:55
  • 1
    It is worth noting that a $\log$ link is not synonymous with Poisson regression. A $\log$ link could be used for other likelihoods, such as gamma or negative binomial (even Gaussian is theoretically possible). Thus, the claim that an exponential activation function (inverse of a logarithmic link function) corresponds to Poisson regression is not correct. // I once posted a possible related question on AI.SE. – Dave Oct 05 '23 at 13:13
  • 1
    It is the combination of corresponding loss, inverse link activation and 0 hidden layer architecture that mimics a GLM via gradient descent. Exactly how it is written in Mario's notes, btw. – Michael M Oct 05 '23 at 14:28

2 Answers2

7

My question is: for the general GLM, what should be the corresponding activation function if the GLM were represented as a neural network?

If you have a model for the expectation of the form:

$$\mu = g^{-1}(X\beta)$$

In GLM jargon, $X\beta=\eta$ is the linear prediction, and $g$ is the link function.

Then it can be equally stated as a neural network with a single layer and $g^{-1}$ as the activation function.

Note, however, as @Sycorax and @Dave mentioned, a linear predictor and a link function do not a GLM make. A GLM also has another component: an assumption on the conditional expectation of the response variable. In GLMs, we assume the response is distributed, conditional on the predictors, according to a particular distribution from the exponential family. Different choices entail different likelihoods.

Firebug
  • 19,076
  • 6
  • 77
  • 139
5

An activation function alone does not make a neural network into a GLM. There are three defining features of a GLM:

  1. The loss function. A GLM maximizes the likelihood of a probability model, so to make a NN into a GLM, we need to optimize the same function.
  2. No hidden layers. A GLM is a linear model, so our NN must also be a linear model.
  3. The link function. GLMs use a link function to relate the linear predictor to the likelihood, and we need the NN to match that.

Here's an example of how to put these facts together to make a logistic regression NN. Is logistic regression a specific case of a neural network?

And a general recipe for getting losses that correspond to exponential family losses, such as those used in GLMs How to construct a cross-entropy loss for general regression targets?

But no particular regression model is a "general GLM," so there is no neural network that corresponds to a "general GLM."

Sycorax
  • 90,934