1

most authors define the canonical form of the exponential family as

$$ p(\mathbf{x} | \boldsymbol{\theta})=h(\mathbf{x}) \exp (\boldsymbol{\eta}(\boldsymbol{\theta}) \cdot \mathbf{T}(\mathbf{x})-A(\boldsymbol{\theta})) $$

with the restriction that $h(x)$ must be non-negative. Why does one not simply define instead

$$ p(\mathbf{x} | \boldsymbol{\theta})= \exp (g(\mathbf{x}) + \boldsymbol{\eta}(\boldsymbol{\theta}) \cdot \mathbf{T}(\mathbf{x})-A(\boldsymbol{\theta})) $$

with $g(\mathbf{x}) = \log h(\mathbf x)$, allowing us to drop the non-negativity assumption since it is sufficed automatically. This also lets us interpret $g(\mathbf{x})$ as a bias term added to the inner product $\langle\boldsymbol{\eta}(\boldsymbol{\theta}), \mathbf{T}(\mathbf{x})\rangle$

1 Answers1

1

The first form makes more sense if you think about there being continuous and discrete exponential families. Then $h$ is a known probability measure that is absolutely continuous with respect to either Lebesgue or counting measure and the $$ \exp(\eta(\theta) T(x) - A(\theta)) $$ part is the density (or Radon–Nikodym derivative) with respect to this probability measure $h$.

In this interpretation the first representation is intuitive since we usually write probability measures that are indexed by densities as base measure times density.