I have been looking for examples of the TIC and couldn't find any. In particular I would like to know how exactly do you estimate the penalty term in TIC. That term consists of, as I found it somewhere, score function and Fisher information. Are there any online resources where I can find that?
-
1You found something online but you don't share it with us.... – Nick Cox May 27 '14 at 10:11
-
Sorry for that nick. As i am not even sure if it is true i didn't want to confuse anyone with that. It was only sort of a guideline towards what i am looking for. – Cubestone May 27 '14 at 10:43
-
1I don't know anything about TIC; I'm just trying to prod you into asking a better question. But people who know something can't even tell you how reliable your source if you won't reveal it. – Nick Cox May 27 '14 at 14:00
-
i got it. link thats the source – Cubestone May 28 '14 at 10:05
1 Answers
Here is an example found from the book Information Criteria and Statistical Modeling (pages 61-64) with my own minor changes and corrections to the presentation:
$\text{TIC}$ for normal model
Let data samples $x_1, x_2, ..., x_n$ be generated from a true distribution $g(x)$. We aim to estimate $g(x)$ by using a normal model:
$$f(x|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right).\hspace{3cm}(1)$$
We estimate the parameters $\mu, \sigma^2$ with maximum likelihood (which is required by $\text{TIC}$ to my understanding), so our estimator model is:
$$f(x|\hat{\mu},\hat{\sigma}^2) = \frac{1}{\sqrt{2\pi\hat{\sigma}^2}}\exp\left(-\frac{(x-\hat{\mu})^2}{2\hat{\sigma}^2}\right),$$
where $\hat{\mu}=n^{-1}\sum_{i=1}^n x_i$ and $\hat{\sigma}^2=n^{-1}\sum_{i=1}^n (x_i-\hat{\mu})^2$.
The $\text{TIC}$ for the given model $f(x|\boldsymbol\theta)$ (where $\boldsymbol\theta =(\mu, \sigma^2)$ now) is defined as:
$$\text{TIC}=-2\sum_{i=1}^n \log f(x_i|\boldsymbol\theta) + 2\text{tr}(I(\boldsymbol\theta)J(\boldsymbol\theta)^{-1}),$$
where $b=\text{tr}(I(\boldsymbol\theta)J(\boldsymbol\theta)^{-1})$ is the bias term and the $d\times d$ ($d$ is dimensionality of the parameter vector, here we have $d=2$) matrices $I(\boldsymbol\theta)$ and $J(\boldsymbol\theta)$ are defined as:
$$I(\boldsymbol\theta)=E_g \left[\frac{\partial\log f(X|\boldsymbol\theta)}{\partial \boldsymbol\theta}\frac{\partial\log f(X|\boldsymbol\theta)}{\partial \boldsymbol\theta^T}\biggr\rvert_{\boldsymbol\theta=\boldsymbol\theta}\right],\hspace{2cm}(2)$$
$$J(\boldsymbol\theta)=-E_g \left[\frac{\partial^2 \log f(X|\boldsymbol\theta)}{\partial\boldsymbol\theta \partial\boldsymbol\theta^T}\biggr\rvert_{\boldsymbol\theta=\boldsymbol\theta}\right].\hspace{4cm}(3)$$
In the above, tr$(\cdot)$ stands for matrix trace and $E_g$ is expectation with respect to distribution $g(x)$. Lets now proceed to calculate the matrices $I(\boldsymbol\theta)$ and $J(\boldsymbol\theta)$ given our model $f(x|\mu,\sigma^2)$ in $(1)$. First, we need to calculate the log-likelihood function:
$$\log f(x|\boldsymbol\theta)=-\frac{1}{2}\log\left(2\pi\sigma^2\right)-\frac{(x-\mu)^2}{2\sigma^2}.$$
The partial derivatives of the log-likelihood function are:
$$\frac{\partial \log f(x|\boldsymbol\theta)}{\partial\mu}=\frac{x-\mu}{\sigma^2},\hspace{1cm}\frac{\partial \log f(x|\boldsymbol\theta)}{\partial\sigma^2}=-\frac{1}{2\sigma^2}+\frac{(x-\mu)^2}{2\sigma^4}$$
$$\frac{\partial^2 \log f(x|\boldsymbol\theta)}{\partial\mu^2}=-\frac{1}{\sigma^2},\hspace{1cm}\frac{\partial^2 \log f(x|\boldsymbol\theta)}{\partial(\sigma^2)^2}=\frac{1}{2\sigma^4}-\frac{(x-\mu)^2}{\sigma^6}$$
$$\frac{\partial^2 \log f(x|\boldsymbol\theta)}{\partial\mu\partial \sigma^2}=\frac{\partial^2 \log f(x|\boldsymbol\theta)}{\partial\sigma^2\partial\mu}=-\frac{x-\mu}{\sigma^4}.$$
We therefore have the corresponding $2\times 2$ matrices:
$$\begin{aligned}I(\boldsymbol\theta)&=E_g\left[\begin{pmatrix} \frac{\partial \log f(X|\boldsymbol\theta)}{\partial\mu} \\ \frac{\partial \log f(X|\boldsymbol\theta)}{\partial\sigma^2} \end{pmatrix}\begin{pmatrix} \frac{\partial \log f(X|\boldsymbol\theta)}{\partial\mu} & \frac{\partial \log f(X|\boldsymbol\theta)}{\partial\sigma^2} \end{pmatrix}\right]\\ &=E_g\left[\begin{pmatrix} \frac{X-\mu}{\sigma^2} \\ -\frac{1}{2\sigma^2}+\frac{(X-\mu)^2}{2\sigma^4} \end{pmatrix}\begin{pmatrix} \frac{X-\mu}{\sigma^2} & -\frac{1}{2\sigma^2}+\frac{(X-\mu)^2}{2\sigma^4} \end{pmatrix}\right] \\ &=E_g\left[\begin{matrix} \frac{(X-\mu)^2}{\sigma^4} & -\frac{x-\mu}{2\sigma^4}+\frac{(X-\mu)^3}{2\sigma^6} \\ -\frac{X-\mu}{2\sigma^4}+\frac{(X-\mu)^3}{2\sigma^6} & \frac{1}{4\sigma^4}-\frac{(X-\mu)^2}{2\sigma^6}+\frac{(X-\mu)^4}{4\sigma^8} \end{matrix}\right] \\ &= \begin{bmatrix} \frac{1}{\sigma^2} & \frac{\mu_3}{2\sigma^6} \\ \frac{\mu_3}{2\sigma^6} & \frac{\mu_4}{4\sigma^8}-\frac{1}{4\sigma^4} \end{bmatrix}, \end{aligned}$$
$$\begin{aligned}J(\boldsymbol\theta) &= -E_g\begin{bmatrix}\frac{\partial^2 \log f(X|\boldsymbol\theta)}{\partial\mu^2} & \frac{\partial^2 \log f(X|\boldsymbol\theta)}{\partial\mu\partial\sigma^2} \\ \frac{\partial^2 \log f(X|\boldsymbol\theta)}{\partial\sigma^2\partial\mu} & \frac{\partial^2 \log f(X|\boldsymbol\theta)}{\partial(\sigma^2)^2}\end{bmatrix} \\ &= E_g\begin{bmatrix}\frac{1}{\sigma^2} & \frac{X-\mu}{\sigma^4} \\ \frac{X-\mu}{\sigma^4} & \frac{(X-\mu)^2}{\sigma^6}-\frac{1}{2\sigma^4}\end{bmatrix}\\ &= \begin{bmatrix}\frac{1}{\sigma^2} & 0 \\ 0 & \frac{1}{2\sigma^4}\end{bmatrix},\end{aligned}$$
where $\mu_j=E_g\left[(X-\mu)^j\right]$ is the $j$th-order centralized moment. We then have:
$$I(\boldsymbol\theta)J(\boldsymbol\theta)^{-1}=\begin{bmatrix} \frac{1}{\sigma^2} & \frac{\mu_3}{2\sigma^6} \\ \frac{\mu_3}{2\sigma^6} & \frac{\mu_4}{4\sigma^8}-\frac{1}{4\sigma^4} \end{bmatrix}\begin{bmatrix}\sigma^2 & 0 \\ 0 & 2\sigma^4\end{bmatrix}=\begin{bmatrix}1 & \frac{\mu_ 3}{\sigma^2}\\ \frac{\mu_3}{2\sigma^4} & \frac{\mu_4}{2\sigma^4}-\frac{1}{2}\end{bmatrix},$$
and therefore the bias term is:
$$b=\text{tr}\left(I(\boldsymbol\theta)J(\boldsymbol\theta)^{-1}\right) = 1 + \frac{\mu_4}{2\sigma^4}-\frac{1}{2}=\frac{1}{2}\left(1+\frac{\mu_4}{\sigma^4}\right).\hspace{2cm}(4)$$
Thus, by now plugging in the maximum likelihood estimator $\hat{\boldsymbol\theta}=(\hat{\mu}, \hat{\sigma}^2)$, we get the estimator for the bias term:
$$\hat{b}=\text{tr}\left(I(\hat{\boldsymbol\theta})J(\hat{\boldsymbol\theta})^{-1}\right)=\frac{1}{2}\left(1+\frac{\hat{\mu}_4}{\hat{\sigma}^4}\right),$$
where $\hat{\sigma}^4 = (\hat{\sigma}^2)^2$ and $\hat{\mu}_4=n^{-1}\sum_{i=1}^n (x_i-\hat{\mu})^4$. It then follows that the $\text{TIC}$ in this example is:
$$\begin{aligned}\text{TIC} &= -2\sum_{i=1}^n \log f(x_i|\hat{\mu}, \hat{\sigma}^2) + 2\left(\frac12+\frac{\hat{\mu}_4}{2\hat{\sigma}^4}\right) \\ &= -\frac{n}{2}\log(2\pi\hat{\sigma}^2)-\frac{n}{2} + 2\left(\frac12+\frac{\hat{\mu}_4}{2\hat{\sigma}^4}\right).\end{aligned}$$
Note also, that if there exists $\boldsymbol\theta_0$ such that $f(x|\theta_0)=g(x)$, then $g(x)$ is a normal distribution and we have $\mu_3 = 0, \mu_4=3\sigma^4$. We then have in $(4)$ that:
$$b=\frac{1}{2}\left(1+\frac{\mu_4}{\sigma^4}\right)=\frac{1}{2}+\frac{3\sigma^4}{2\sigma^4}=2,$$
in which case, the $\text{TIC}$ becomes $\text{AIC}$ (Akaike information criterion) :
$$\begin{aligned}\text{AIC} &= -2\sum_{i=1}^n \log f(x_i|\hat{\mu}, \hat{\sigma}^2) + 2\times 2 \\ &= -\frac{n}{2}\log(2\pi\hat{\sigma}^2)-\frac{n}{2} + 4.\end{aligned}$$
- 5,807
-
1For the rest of us who aren't inclined to spend 107,09 € to understand your response, could you elaborate more on the contents of those pages? Maybe provide a brief synopsis on the model described and how TIC fits in? – David Marx Feb 20 '18 at 08:05
-
@DavidMarx sure, I can provide the example here, will take me little bit of time. – jjepsuomi Feb 20 '18 at 08:08
-
1