1

Suppose that we have a model with many parameters, which we'll partition into two subvectors called $\theta$ and $\lambda$. In this situation, $\lambda$ corresponds to those parameters that are really of interest, and $\theta$ are nuisance parameters. In particular, we would really like to look at quantities like $P(x|\lambda)$, $P(x)$ and $P(\lambda|x)$, meaning we marginalize on the $\theta$. However, performing the integral over all $\theta$ may be hard.

However, for each $\lambda$, we may not even care about every possible value of $\theta$ for each lambda. We may only care about the best possible value of $\theta$ for that $\lambda$. This would suggest we look at something like this "partial maximum likelihood"

$$ Q(x|\lambda) = \max_\theta P(x|\theta,\lambda) P(\theta|\lambda) = \max_\theta P(x, \theta|\lambda) $$

This is a interesting function in its own right. If we treat it as a likelihood function we can easily derive the related quantities $Q(x) = \sum_\lambda Q(x|\lambda) P(\lambda)$, and $Q(\lambda|x) = Q(x|\lambda) P(\lambda)/Q(x)$. It isn't a true likelihood as the sum on all possible $x$ for $Q(x|\lambda)$ need not be 1, but some quantity which may be very close to $1$. We could try to normalize or just not care, as even if it isn't, the function $Q(\lambda|x)$ is a true probability distribution as these normalizing terms cancel.

Question: what is this called? It doesn't seem to be quite the partial likelihood. I guess one could say it is a quasi-likelihood? It would make sense to call it a "partial maximum likelihood," I guess, except I am not quite sure if it is the same as this thing. Or can we perhaps treat the posterior as a true posterior on some other related quantity?

1 Answers1

1

This is called the profile likelihood function. Taking $\hat{\theta} \equiv \hat{\theta}(x,\lambda) \equiv \text{arg max}_\theta P(x,\theta|\lambda)$ gives $Q(x|\lambda) = P(x,\hat{\theta}|\lambda)$, yielding $Q$ as a profile likelihood.

Ben
  • 124,856
  • Thanks @Ben, looked into it and it seems correct. The only thing is that Wikipedia page says "It is possible to reduce the dimensions by concentrating the likelihood function for a subset of parameters by expressing the nuisance parameters as functions of the parameters of interest and replacing them in the likelihood function" - I don't quite see how the nuisance parameters in my example are a function of the interesting parameter. (1/2) – Mike Battaglia Jul 31 '22 at 22:54
  • But I have seen other expositions of the profile likelihood which instead have it in terms closer to what I wrote; e.g. partitioning the parameters into interesting and nuisance parameters and maximizing on only the nuisance ones, so I guess it is correct. I am still curious about this other view of expressing nuisance parameters as a function of the interesting ones though. (2/2) – Mike Battaglia Jul 31 '22 at 22:55
  • Edited to specify. – Ben Jul 31 '22 at 22:56
  • Ok. So we could also replace $\theta$ with any other suitable function of $\lambda$ and $x$, rather than just the max, and that would also be a different profile likelihood function. Is that correct? – Mike Battaglia Jul 31 '22 at 23:23
  • 1
    You just have to be careful that your reader is following. The term "profile likelihood" is usually used in the narrow sense (using arg max function), so if you use it in the wider sense you could still call it a profile likelihood but you should probably be explicit to your reader that you are using the term in the wider sense. – Ben Aug 01 '22 at 00:21