I have $n$ observations for $m$ variables that are sorted by time, e.g. observation $1$ is the oldest, whereas observation $n$ is the newest. I represent this data as an $n\times m$-dimensional matrix $D$. I would like to compute the mean and covariances between these variables, but having a freedom to give different importance to observations made at different times. The two extreme cases are:
I consider a lower sub-matrix $D_t$ of $D$ which is $t\times m$, and $t<n$ is a number of the latest observations I would like to take into account when computing mean and covariances.
I use the whole matrix $D$ for the computation, to take into account all the data I have, giving it same importance regardless of the observation time.
Other cases lie in between, e.g. I can multiply column-wise $D$ by some weight vector $w$, say to increase the later observation values, and decrease the older observation values. As an example, $$ D = \left( \begin{split} 1 &\; &0.5 \\ -1&\;&2.5 \\ 2&\;&7 \end{split} \right) , \quad w = \left( \begin{split} 0.1 \\ 0.5 \\ 1 \end{split} \right) , \quad wD = \left( \begin{split} 0.1 &\; &0.05 \\ -0.5&\;&1.25 \\ 2&\;&7 \end{split} \right) $$
At the same time, I am not sure whether this is the best way to assign different importance to different observation times. For example, if all the output were binary (true/false), then scaling does not even seem to make any sense, although it looks more natural when output are numerical.
More importantly, I would be happy to incorporate the case 1. as a special case. Unfortunately, when I use the weighting vector $w$ that has $n-t$ zeros at $t$ ones (looks like a natural choice for me), the result is mush different from the case 1 with a matrix $D_t$. Of course, the means are the same up to a scaling factor of $t/n$, but the entries of the covariance matrix do not seem to be changed just by a scaling factor. Any hints on how to approach this problem are appreciated.