1

Let us assume I have a time series made of the following observations:

ts = c(163,18,53,189, 243, 101, 150, 39, 60,96,36,76,71,67,56,3,72,96,15,19)

How can I determine if it respects the Markov chain properties and how can I get the markovian order? Is there a source where I can find a guide?

Any good python or R solutions? The data I am dealing with is made of about 500K observations.

Thanks

  • 1
    Not sure what the formal way is to do this, but you could try to predict $X_t$ given (1) $X_{t-1}$ and (2) the entire vector. If the quality of your predictor for (1) is the same as (2) then you have reason to believe your time series respects the Markov property. – philbo_baggins Oct 10 '20 at 02:47
  • That's indeed a nice idea, but doing so for each of the 500k observations might be computationally challenging. Valerio, are your observations continuous or discrete ? Is your vector ts an example sampled from your 500k observations, or is this the set of possible values that your 500k observations can take ? – Camille Gontier Oct 10 '20 at 14:37
  • @philbo_baggins do you mean by using the empirical transition matrix observed from the studied time series? Do you suggest to simulate the series by using a Montecarlo Approach where the simulated are generated by the empirical transition matrix? – Valerio Ficcadenti Oct 13 '20 at 16:36
  • @CamilleGontier the observations are discrete, they come from a set of status made of natural numbers in [1-247]. I have changed the sample posted because it was not accurate. It could be a subset but I have invented it. Anyway, out of 500K observations, it is plausible to have that sampled vector. :( – Valerio Ficcadenti Oct 13 '20 at 16:38

2 Answers2

1

I think this paper should answer your question : https://www.econstor.eu/handle/10419/2673

The starting point of the procedure is indeed to estimate the transition matrix from your observations, for instance by using a Maximum Likelihood approach. Then, you have to check for these two markovian properties:

  1. You have to verify that your process is stationary. This can be done by splitting your observations into sub-chunks, estimating the transition matrices for each of these subsets, and testing whether the estimates significantly differ.
  2. You have to verify that your process is memory-less, i.e. that your estimates of the transition matrix do not vary if the values of previous states are taken into account.

These statistical tests are described in the paper (and are based on a chi-squared test).

Hope this helps !

Camille Gontier
  • 2,616
  • 6
  • 13
0

The answer to this question can be found in the following paper: Markov Chain Monte Carlo for generating ranked textual data

Assume one starts with an empirical series made by N consecutive observations of integers numbers taken in a finite set called RANK. They represent a discrete time of the stochastic process under analysis. Let us define the discrete-time stochastic process as $X=(X(t):t \in \mathbb{N})$ taking values in RANK and with $P$ the related probability law. $X$ is a Markov Chain of order one if one gets: $$ P(X(t+1)=i_{t+1}\vert X(t)=i_t)=P(X(t+1)=i_{t+1}\vert X(t)=i_t,\dots,X(1)=i_1,X(0)=i_0) $$ for each $t \in \mathbb{N}$ and $i_0,i_1,\dots,i_t,i_{t+1}\in\mathbf{RANK}$. One can write a less heavy relationship and say that $X$ is of order one when the following simplification of the above relationship is true: $$ P(X(t+1)=i_{t+1}\vert X(t)=i_t)=P(X(t+1)=i_{t+1}\vert X(t)=i_t,X(t-1)=i_{t-1}) $$ for each $t \in \mathbb{N}$ and $i_0,i_1,\dots,i_t,i_{t+1}\in\mathbf{RANK}$. So, the transition probabilities of orders one and two appearing in this last equation have been empirically estimated by looking at the consecutive empirical observations. Two steps has to be carried out. In the First step, one checks the validity of ​the last equation​; namely one tests that the empirical transition probabilities estimated by the original sample are associated with a Markov chain of order one. In the Second step, one shows that the obtained Markov chain of order one is a statistically sounding representation of the original sample, hence leading to the final statement of the Markovianity of the data. Steps

  1. Run 1000 simulations for by using the order one and two transition matrices. This step originates 2000 samples from Markov chains. The lengths of the series calculated from the first- and second-order transition matrices are different, they may depend on the length of the original observed sample, the object of the study. Then, the first- and second-order simulated series have been pairwise compared via the Kolmogorov–Smirnov (KS) test. Namely, the empirical distributions of each simulated series from the first-order transition matrix have to be compared with those of the simulated series coming from the second-order transition matrix. If the series can be considered as coming from the same distribution, one can conclude that there is no statistical difference between the first-order transition probabilities and the second-order ones. This outcome supports the validity of ​the above reported equation,​ so that the Markov chain obtained by using the empirical transition probabilities estimated from the original sample is of order one. Furthermore, one can also use a Wilcoxon-Mann–Whitney test to compare the simulated series based on transition probability of order one and two, between each other. In this way, one can check of the absence of compelling evidence that the simulated series data differ. So, one can conclude that the simulated series are obtained by the same data generating process. This finding strongly supports that the observed data are realisations of a first-order Markov chain.
  2. One has to check that the Markov chain of order one, if the above is all true, with states space in RANK and whose transition probabilities are estimated from the available observations, can be seen as the data generating process of the original sample. In so doing, one provides additional evidence that the evolutive process related to the formation of the target distribution has a Markovian structure. One compare, for example, 1000 of the simulated series to the original one, namely to the one under study. That can be done with a Chi-Square test or again, with the KS test. In addition, the main statistical indicators from the simulated series can be calculated (mean, st. dev., Kurt. and Skew.). If their empirical distributions are centred on the same parameters from the original series, this is a further proof of markovianity of order one, since the simulated series are generated from transition probabilities of order one, estimated from the observed / studied series.