I am asking myself wether PCA and tICA mandatorily need: 1) 2) Random data as input, i.e. the values sampled per each feature need to have "no memory" of the other ones;
Indeed, I was wondering about this: what if I have a non-stationary process (let it be the progressive decrease of a price before reaching the plateau, and let the price be influenced by many underlying causes, which I'll call "features") which I want to analyze in the first time span - say, first 100 days out of 1000 of the whole observation - in which it evolved towards equilibrium, being that time span very brief and so forcing me to sample the data one next the other, in order to have a reasonable size of the dataset? For example: a day after the other, or even each 12 hours, when for sure decisions of people which then influence the price are highly "internally-correlated" (i.e., taken a feature, the values of it being influenced by previous ones).
Does any between the two, PCA and tICA, have sense? Just to give a stimulus to our discussion: in the case of PCA, I would expect that anyway I would get which of the features owns the maximum variance within that amount of time, moreover I would not care about the issue of some of the features having memory of its previous value. I mean:
- Yes, the process is non-stationary;
- Some features may have a longer lifetime and thus contribute to the result with their (for instance) lower variation due to the influence of previous states, thus being not random...
..but still the first PCA components obtained would be those that, given all these influences, own the higher variance. Thus, I still have the relevant information I'm interested in. Am I wrong? And what about tICA? Thanks in advance, I also thank you for your time.
Best Regards, Jacopo