8

The usual proof of the Expected score in ML expected score being zero goes 'similar' to this:

$f(z;\theta)$ is the density function, for data $z$, and parameter vector $\theta$, so $\int f(z;\theta)dz=1$ for any $\theta$. This implies that, under the usual regularity conditions that allow us to interchange derivations and integration operators, we have:

$\int \frac{\partial f(z;\theta)}{\partial \theta}dz=0 \Leftrightarrow \int \frac{f(z;\theta)}{f(z;\theta)} \frac{\partial f(z;\theta)}{\partial \theta}dz=0\Leftrightarrow \int f(z;\theta) \frac{\partial log f(z;\theta)}{\partial \theta}dz=0$

From what I could gather this is the usual proof. However, in no steps do we make an assumption that $\theta$ must be the true value. In my reasoning, as long as the parameter vector was in the domain allowed for the function $f$ to be a density, then we would always have the expected score equal to zero. Or am I missing something?

mdewey
  • 17,806
  • 1
    You might find it illuminating to see what your analysis says when $f(z;\theta)$ describes the family of Normal distributions of unit variance and unknown mean $\theta$. – whuber Dec 10 '14 at 19:09
  • @whuber, If the thetas are all the same, even if not the true one, the equations still seem true... – An old man in the sea. Dec 10 '14 at 19:26

1 Answers1

8

Verbal formulation: "the expected value of the derivative of the log-likelihood of the sample evaluated (the derivative) at the true value, equals zero".

Innocent question-crucial remark: what is the "expected value" of a function? (informal) A: it is an integral through which we weigh a function using as weight the true density of the random variables that appear in the function.

Assume we have a sample of size $n$ and we have specified the joint density $g(\mathbf z;\xi$), which may, or may not, be the true joint density which is $f(\mathbf z;\theta_0)$.

We then examine the (multiple) integral

$$E\left(\frac {\partial \log g(\mathbf z;\xi)}{\partial \xi}\Big |\xi=\xi_0\right) = \int f(\mathbf z;\theta_0) \cdot \left(\frac {\partial g(\mathbf z;\xi)/\partial \xi}{g(\mathbf z;\xi)}\Big |\xi=\xi_0\right)\text {d}\mathbf z$$

Note carefully which components of the integrand are evaluated at $\xi_0$ and which are not: only the derivative of the specified log-likelihood is, while the true density is not "evaluated" -by construction, it contains the true parameter $\theta_0$, irrespective of what we have specified for our sample.

Now, If $g(\cdot ;\xi) = f(\cdot ;\theta)$ then the above becomes

$$= \int f(\mathbf z;\theta_0) \frac {\partial f(\mathbf z;\theta_0)/\partial\theta}{f(\mathbf z;\theta_0)}\text {d}\mathbf z = \frac {\partial}{\partial \theta} \int f(\mathbf z;\theta_0) \text {d}\mathbf z = \frac {\partial}{\partial \theta} \big(1\big) = 0$$

The integral equals unity (i.e. it is constant) so its derivative w.r.t to $\theta$ will be equal to zero.

So, first conclusion: misspecification with respect to the distribution family destroys the result (*). But assume that you have specified the distribution family correctly, and let's evaluate the expression at some point $\theta_1$:

$$E\left(\frac {\partial \log f(\mathbf z;\theta)}{\partial \theta}\Big |\theta=\theta_1\right) = \int f(\mathbf z;\theta_0) \cdot \frac {\partial f(\mathbf z;\theta_1)/\partial\theta}{f(\mathbf z;\theta_1)}\text {d}\mathbf z$$

Here, the ratio $f(\mathbf z;\theta_0) /f(\mathbf z;\theta_1)$ no longer cancels, and so we cannot interchange the order of integration and differentiation to obtain the result as before (because then we would subject to differentiation the ratio that doesn't cancel also). Therefore, even though $f(\mathbf z;\theta_1)$ is also a density that integrates to unity, we cannot "leave it alone" inside the integral and exploit this property.

So, only at the true value does the result hold, because, the expected value is taken with respect to the true and unknown density, irrespective of our sample specifications.

--
(*) It also destroys the "Information Matrix Equality", see this short exposition of mine, https://alecospapadopoulos.wordpress.com/2014/05/13/information-matrix-equality/