0

Takeuchi’s information criterion is said to be the generalization of AIC to misspecified models. That publication presents DEGREES OF FREEDOM FOR NONLINEAR LEAST SQUARES ESTIMATION. From that source: enter image description here

This appears similar to a Tikhonov regularization. Would someone please explain what Takeuchi’s information criterion is in simple language? It would appear that calulating degrees of freedom is not simple. In addition to df calculations, are there other uses for TIC? If, as George Box says "Essentially, all models are wrong...." which application of AIC is not to misspecified models? Should we all be using TIC as opposed to AIC?

Carl
  • 13,084
  • 1
    Note that when the model's not mis-specified (following Hansen & Sokel's notation), $\mathbb{J}=\mathbb{G}$ is the Fisher information matrix & $\operatorname{tr}\left(\mathbb{J}^{-1}\mathbb{G}\right) = p$. So your last question boils down to whether it's better either to approximate $\operatorname{tr}\left(\mathbb{J}^{-1}\mathbb{G}\right)$ under the assumption that the model isn't badly mis-specified (AIC) or to estimate it from the data (TIC). (Which will depend on what you're using the information criteria for, & on the amount of data you have.) – Scortchi - Reinstate Monica Feb 24 '21 at 12:33
  • @Scortchi-ReinstateMonica Thanks for the information. My limited understanding is that TIC appears to be regularization, and that may be useful when the misspecification is in part from inappropriately using AIC for an ill-posed problem, for example for censored data. – Carl Feb 24 '21 at 12:57
  • Yes: you can think of model selection with either AIC or TIC as maximizing a penalized likelihood over a set of models. Is there some particular class of models for censored data you've got in mind? - the bog-standard maximum-likelihood approach for non-informative censoring doesn't preclude use of AIC. – Scortchi - Reinstate Monica Feb 24 '21 at 15:11
  • @Scortchi-ReinstateMonica Indeed there is. For drug concentrations in blood as a function of time it is usually not possible to collect data to extinction for several reasons. Consequently, ML regression would not be appropriate, not because concentration curves are not random variables, and of course they are not, but because the data does not span the concentration space (i.e., space=time in that context). It is however amenable to Tikhonov regularization. – Carl Feb 25 '21 at 01:32

1 Answers1

2

I think this answer comes a bit late and is not complete, but my understanding on the subject is that information criteria, in its general form, is a bias corrected log-likelihood function. That is, when we do modeling in many contexts, the notion of maximum likelihood comes into play. We prefer to select models which have a high (log-)likelihood value with respect to the observed data.

However, because we use the observed data for both fitting the model and evaluating its performance via the likelihood value, it follows that this likelihood value contains (optimistic) bias. In machine learning context, this bias is why for example we do not use the same data for training the model and testing its (out-of-sample) performance, and instead we split our data into training and test sets.

So the (log-)likelihood value requires a bias correction into it in order to not get an optimistically biased likelihoods. And so, in general form, information criteria is:

$$\text{Information criteria = log-likelihood value under the approximation model - bias value}$$

Takeuchi's information criteria (TIC) is to my understanding a general asymptotic form of the information criteria, when there are no assumptions made about the approximation model (for example that the true model is contained in our selected hypothesis function space), and AIC for example is a special case of TIC.

I previously wrote an example about TIC (and AIC) in this post, if it's any help to you:

Where can I find examples of Takeuchi Information Criterion (TIC) at work?

jjepsuomi
  • 5,807