23

I am reading the book:

Bishop, Pattern Recognition and Machine Learning (2006)

which defines the exponential family as distributions of the form (Eq. 2.194): $$ p(\mathbf x|\boldsymbol \eta) = h(\mathbf x) g(\boldsymbol \eta) \exp \{\boldsymbol \eta^\mathrm T \mathbf u(\mathbf x)\} $$ But I see no restrictions placed on $h(\mathbf x)$ or $\mathbf u(\mathbf x)$. Doesn't this mean that any distribution can be put in this form, by appropriate choice of $h(\mathbf x)$ and $\mathbf u(\mathbf x)$ (in fact only one of them has to be chosen properly!)? So how come the exponential family does not include all probability distributions? What am I missing?

Finally, a more particular question that I am interested in is this: Is the Bernoulli distribution in the exponential family? Wikipedia claims it is, but since I am obviously confused about something here, I would like to see why.

a06e
  • 4,410
  • 1
  • 22
  • 50
  • 3
    for the proof that the Bernoulli distribution is in the exponential family, try using the fact that $f(x; \mu) = \exp (\log( f(x; \mu)))$ and see where that gets you – jld Jul 31 '17 at 10:45
  • 2
    Just to clarify, are you asking whether any distribution can be written in this form, or whether any family of distributions can be written in this form? You seem to have gotten answers to the latter question. – Owen Jul 31 '17 at 12:41
  • 1
    @Owen Yes, I see now that this is the crucial point. Although any distribution can be written in this form (by setting $h(\mathbf x)$ appropriately, and $g=1,\mathbf u= 0$), that does not imply that any family can be written in this form. – a06e Jul 31 '17 at 12:44
  • 6
    @becko, That's exactly right. The phrasing in the text, "the exponential family", is somewhat misleading, because there's not just one exponential family; rather, each choice of $(h, g, \mathbf u)$ gives rise to a family. Many authors instead say "an exponential family", making this more clear; e.g., see the Wikipedia page: https://en.wikipedia.org/wiki/Exponential_family – Brent Kerby Jul 31 '17 at 14:22
  • "An exponential family" is so much clearer. When I first learned glm's the "the exponential family" phrasing kept me from a good understanding for at least a year, and contributed to my thinking statistics was cryptic and un-understandable. – Matthew Drury Aug 01 '17 at 04:57
  • 2
    @becko I think your argument shows that any given distribution can be one member of an exponential family, but not that any family of distributions can be an exponential family. – Matthew Drury Aug 01 '17 at 04:58
  • @MatthewDrury Good, that's a nice way of summarising it. – a06e Aug 01 '17 at 07:52

3 Answers3

27

Consider the non-central Laplace distribution $$ f(x; \mu, \sigma) \propto \exp \left(-| x - \mu | / \sigma \right). $$

Unless $\mu = 0$ you won't be able to write $|x - \mu|$ as an inner product between $\mu$ and some function of $x$.

The exponential family does include the vast majority of the nice named distributions that we commonly encounter, so at first it may seem like it has everything of interest, but it by no means is exhaustive.

jld
  • 20,228
24

First, note there is a terminology problem in your title: the exponential family seems to imply one exponential family. You should say a exponential family, there are many exponential families.

Well, one consequence of your definition: $$p(\mathbf x|\boldsymbol \eta) = h(\mathbf x) g(\boldsymbol \eta) \exp \{\boldsymbol \eta^\mathrm T \mathbf u(\mathbf x)\}$$ is that the support of the distribution family indexed by parameter $\eta$ do not depend on $\eta$. (The support of a probability distribution is the (closure of) the smallest set with probability one, or in other words, where the distribution lives.) So it is enough to give a counterexample of a distribution family with support depending on the parameter, the most easy example is the following family of uniform distributions: $ \text{U}(0, \eta), \quad \eta > 0$. The following is an interesting example: Exponential family definition appears vacuous

Another, unrelated reason that not all distributions are exponential family, is that an exponential family distribution always have an existing moment generating function. Not all distributions have a mgf.

  • 1
    Where can I find a formal proof that "an exponential family distribution always have an existing moment generating function."? – mhdadk Sep 15 '21 at 11:05
  • 1
    I will add to, and make more precise, that comment – kjetil b halvorsen Sep 15 '21 at 11:08
  • 1
    Thanks a lot. If you could also comment on whether having an MGF is a sufficient condition for being an exponential family distribution, that would be great. I think it is only a necessary condition, but not sure. – mhdadk Sep 15 '21 at 11:24
  • @mhdadk The $U(0,\eta)$ counter-example would work there too as it has an MGF, namely $\frac{e^{\eta t}-1}{(\eta-1)t}$ when $t \not=0$ and $1$ when $t=0$ – Henry May 11 '23 at 15:33
2

Both the existing answers are good, but just to try add a bit of intuition about what is going on here.

The equation you have written defines how to make an exponential family of distributions. Fixing $h$, $g$ and $u$ will give you a set of distributions that have parameter $\eta$. The correct choice of $h$, $g$ and $u$ will give you the Normal family with $\eta = (\mu, \sigma^2)$. There are thus an infinite number of exponential families, a finite number of which have names (Normal, Dirichlet, Poisson, ...)

You are sort of correct in that any specific distribution will be in an exponential family. The issue is finding $h$, $g$ and $u$ such that you completely cover another "traditional" family. So for example, the t-Distribution family is not an exponential family, but any specific realisation of the t-distribution will be in an exponential family. For example a t on 5 degrees of freedom centred on zero with scale 1 can be put into the exponential family form in an infinite number of ways. However, no other t-distributions will now be in that exponential family that you have made. It's kind of like a stopped watch is right twice a day.

The bit that typically goes wrong algebraically if you try to write these distributions as exponential family is that to be useful you need to be able scale and shift $x$ by your parameters. $h$ is no use because it doesn't have the parameter in it, and $g$ is useless because it just multiplies the whole pdf up and down - it's just normalising. That just leaves the product in the exponential - and you aren't allowed to apply any function after it. In my t(5) example the pdf is something like

$$f(x) \propto \left( 1 + \frac{x^2}{5}\right)^{-3} = \exp\left(-3 \ln\left(1+\frac{x^2}{5}\right)\right)$$

You can't get "inside" that $\ln$ so the only thing you can really do is say make a family where the 3 changes, but that isn't even changing the d.o.f. because the 5 inside the $\ln$ isn't changing. So I've made a new (pretty silly) exponential family that contains one t-Distribution, but I can't ever get all of them in the same family, plus I also pick up a load of weird distributions that are not t.

Corvus
  • 5,345