Representing a subset of observation using weights

Question

I have $n$ observations for $m$ variables that are sorted by time, e.g. observation $1$ is the oldest, whereas observation $n$ is the newest. I represent this data as an $n\times m$-dimensional matrix $D$. I would like to compute the mean and covariances between these variables, but having a freedom to give different importance to observations made at different times. The two extreme cases are:

I consider a lower sub-matrix $D_t$ of $D$ which is $t\times m$, and $t<n$ is a number of the latest observations I would like to take into account when computing mean and covariances.
I use the whole matrix $D$ for the computation, to take into account all the data I have, giving it same importance regardless of the observation time.

Other cases lie in between, e.g. I can multiply column-wise $D$ by some weight vector $w$, say to increase the later observation values, and decrease the older observation values. As an example, $$ D = \left( \begin{split} 1 &\; &0.5 \\ -1&\;&2.5 \\ 2&\;&7 \end{split} \right) , \quad w = \left( \begin{split} 0.1 \\ 0.5 \\ 1 \end{split} \right) , \quad wD = \left( \begin{split} 0.1 &\; &0.05 \\ -0.5&\;&1.25 \\ 2&\;&7 \end{split} \right) $$

At the same time, I am not sure whether this is the best way to assign different importance to different observation times. For example, if all the output were binary (true/false), then scaling does not even seem to make any sense, although it looks more natural when output are numerical.

More importantly, I would be happy to incorporate the case 1. as a special case. Unfortunately, when I use the weighting vector $w$ that has $n-t$ zeros at $t$ ones (looks like a natural choice for me), the result is mush different from the case 1 with a matrix $D_t$. Of course, the means are the same up to a scaling factor of $t/n$, but the entries of the covariance matrix do not seem to be changed just by a scaling factor. Any hints on how to approach this problem are appreciated.

score 0 · Accepted Answer · answered Nov 03 '14 at 14:44

The thing you are trying to do is not very difficult, just be more careful and it will work.

For the weighted means, take a normalized weight vector $w$, that is a vector with the sum of weights equal to 1. With such vector your life becomes very easy, as $w^TD$ (using your notation for $w$ and $D$, and multiplication meaning matrix multiplication) will be $1\times m$ vector of weighted means.

If you want to ignore the first $n-t$ observations, corresponding vector $w$ will be $(0,\dots, 0, 1/t, \dots, 1/t)$ with $n-t$ zeros and $t$ values $1/t$. That's it!

I will not rewrite formulas for weighted covariances, you can find them in wikipedia.

Thanks for the answer, can't upvote it since my reputation is low. — Ulysses, Nov 07 '14 at 08:43

Representing a subset of observation using weights

1 Answers1