What is the activation function of a feedforward neural network that corresponds to GLM?

Question

I am reading a slide which says GLM are neural networks. https://owars.info/mario/2020_Wuthrich.pdf

The authors showed that if you have an exponential function as the activation function of a very simple 1-layer neural network, then it corresponds to Poisson regression, which is a type of GLM.

My question is: for the general GLM, what should be the corresponding activation function if the GLM were represented as a neural network?

the key is to have only one layer; different activations would correspond to different "link functions" in GLM parlance; there is no single activation function which makes a neural net into a GLM. — John Madden, Oct 04 '23 at 19:55
It is worth noting that a $\log$ link is not synonymous with Poisson regression. A $\log$ link could be used for other likelihoods, such as gamma or negative binomial (even Gaussian is theoretically possible). Thus, the claim that an exponential activation function (inverse of a logarithmic link function) corresponds to Poisson regression is not correct. // I once posted a possible related question on AI.SE. — Dave, Oct 05 '23 at 13:13
It is the combination of corresponding loss, inverse link activation and 0 hidden layer architecture that mimics a GLM via gradient descent. Exactly how it is written in Mario's notes, btw. — Michael M, Oct 05 '23 at 14:28

Firebug · Accepted Answer · 2023-10-05T17:50:54.517

7

My question is: for the general GLM, what should be the corresponding activation function if the GLM were represented as a neural network?

If you have a model for the expectation of the form:

$$\mu = g^{-1}(X\beta)$$

In GLM jargon, $X\beta=\eta$ is the linear prediction, and $g$ is the link function.

Then it can be equally stated as a neural network with a single layer and $g^{-1}$ as the activation function.

Note, however, as @Sycorax and @Dave mentioned, a linear predictor and a link function do not a GLM make. A GLM also has another component: an assumption on the conditional expectation of the response variable. In GLMs, we assume the response is distributed, conditional on the predictors, according to a particular distribution from the exponential family. Different choices entail different likelihoods.

edited Oct 05 '23 at 17:50

answered Oct 05 '23 at 13:08

Firebug

19,076
6
77
139

3

But just to be 100% clear, there's no single $g^{-1}$ that is held in common across all of the different GLM families. And each probability family has its own likelihood, so the functions under optimization likewise vary. – Sycorax Oct 05 '23 at 13:10
@Sycorax true, I should add that – Firebug Oct 05 '23 at 14:16
2

The answer misses an important aspect: the choice of corresponding loss function. – Michael M Oct 05 '23 at 14:30
1

@MichaelM that's the likelihood part – Firebug Oct 10 '23 at 22:50
@Firebug: for the GLM yes, but for the nn it is simply a loss. – Michael M Oct 11 '23 at 05:53

Sycorax · Answer 2 · 2023-10-05T22:04:05.567

An activation function alone does not make a neural network into a GLM. There are three defining features of a GLM:

The loss function. A GLM maximizes the likelihood of a probability model, so to make a NN into a GLM, we need to optimize the same function.
No hidden layers. A GLM is a linear model, so our NN must also be a linear model.
The link function. GLMs use a link function to relate the linear predictor to the likelihood, and we need the NN to match that.

Here's an example of how to put these facts together to make a logistic regression NN. Is logistic regression a specific case of a neural network?

And a general recipe for getting losses that correspond to exponential family losses, such as those used in GLMs How to construct a cross-entropy loss for general regression targets?

But no particular regression model is a "general GLM," so there is no neural network that corresponds to a "general GLM."

What is the activation function of a feedforward neural network that corresponds to GLM?

2 Answers2