5

I want to generate a standard normal distributed time-series. In addition the ACF of my timeseries should match a desired ACF. I have given lags 1 to 30 with the corresponding ACF-values. For further processing, the CDF of the timeseries should be the standard normal CDF.

When i try filtering a standard normal distributed timeseries with my ACF, the result does not have the desired attributes. Neither is the ACF like the desired ACF, nor is the CDF unchanged.

I tried the approach from jblood94 from this and this thread. But this approach only works for very specific ACF-values.

When i insert my desired ACF, this approach does not work.

Here is my desired ACF:

enter image description here

The model for my ACF:

lags   =  0:30
a      =  0.9999
b      = -0.07197 
acf    = a*exp(b.*lags)

I assume that the problem is the high correlation at the beginning.

The timeseries needs to be standard normal distributed, for the next steps in my program.

The following Matlab-Code includes the approach. But as you can see, it will only work, if we set a minimum of 45 lags. Why is this so? In addition, i have to adjust the initial STD of my samples. How can i calculate the initial STD, to be after the filtering 1?

%% init
clear;
close all;
clc;

set(groot, 'DefaultAxesLineWidth', 2); set(groot, 'DefaultLineLineWidth', 2); set(groot, 'DefaultAxesFontSize', 14);

%% define desired auto-correlation function lagsDesired = 0:45; % if lag is <45 the approach fails a = 0.999; b = -0.07197; acfDesired = a * exp(b .* lagsDesired);

%% generate normal distributed samples numSamples = 1e6; % sigma increases during filtering (How can i calculate the factor?) sigmaFactor = 0.35; x = sigmaFactor * randn(numSamples, 1);

%% predefine indices m = length(acfDesired); i1 = sequence((m-1):-1:1); i2 = sequence((m-1):-1:1, 'from', 2:m); i3 = cumsum((m-1):-1:1);

%% initial values for adjustment w = acfDesired; temp = cumsum(w(i1) .* w(i2)); a = [1, diff([0 temp(i3)])/sum(w.^2)];

%% iterative adjustment of the filter weights while (max(abs(acfDesired - a)) >= 2e-3) %sqrt(eps)) max(abs(acfDesired - a)) temp = cumsum(w(i1) .* w(i2)); a = [1, diff([0 temp(i3)])/sum(w.^2)]; w = w .* (acfDesired./a); end

%% filter normal samples with weights y = filter(w, 1, x);

%% transform normal to uniform samples via standard normal cdf uniRandCorr = normcdf(y);

%% plot figure; [Fy, xy] = ecdf(y); [Fn, xn] = ecdf(randn(numSamples, 1)); plot(xy, Fy); hold on; plot(xn, Fn);

l = legend({'ECDF of samples' , 'Desired ECDF'}, 'Location', 'best'); title(l, 'Legend'); title('CDF'); xlabel('Value'); ylabel('CDF'); grid on;

figure; [acf, lags] = autocorr(uniRandCorr, 'NumLags', 60); plot(lags, acf); hold on; plot(lagsDesired, acfDesired);

l = legend({'ACF of samples' , 'Desired ACF'}, 'Location', 'best'); title(l, 'Legend'); title('ACF'); xlabel('Lag'); ylabel('ACF'); grid on;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [seq] = sequence(nvec, varargin) % [seq] = sequence(nvec, varargin) % % DESCRIPTION: % R-Functions translated to MATLAB % Creates a vector of sequences % R: A Language and Environment for Statistical Computing Reference Index % Version 4.2.3 Page 505 ff % % EXAMPLE: % - % % INPUT: % nvec : Input vector % from : Starting value(s) % by : Increment of the sequence(s) % % OUTPUT: % seq : Generated sequence % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Parse inputs iParser = inputParser;

addRequired(iParser, 'nvec', ... @(x) isnumeric(x) && all(x>=0)); addOptional(iParser, 'from', ones(length(nvec), 1), ... @(x) isnumeric(x) && (isscalar(x) || length(x)==length(nvec))); addOptional(iParser, 'by', ones(length(nvec), 1), ... @(x) isnumeric(x) && (isscalar(x) || length(x)==length(nvec)));

parse(iParser, nvec, varargin{:});

if isscalar(iParser.Results.from) from = ones(length(nvec))*iParser.Results.from; else from = iParser.Results.from; end

if isscalar(iParser.Results.by) by = ones(length(nvec))*iParser.Results.by; else by = iParser.Results.by; end

seq = [];

for i=1:length(nvec) seq = [seq, linspace(from(i), from(i) + (by(i)*(nvec(i)-1)), nvec(i))]; end

end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

This gives the following output:

ACF CDF

As you can see, the result is correct. BUT, it works only with min. 45 lags AND if i change the model of the acf I have to adapt this parameter. In addition i have no idea, how i can calculate the sigma of the gauss samples, so that the sigma AFTER the filtering is 1 (standard). I know, that the standarddeviation changes during filtering, but I have not found a formula for this yet.

Christian
  • 61
  • 8
  • The duplicate is about uniformly distributed data, but the answer is also applicable to normally distributed data. See also threads carrying both the "autocorrelation" and the "random-generation" tags. – Stephan Kolassa Mar 22 '23 at 17:13
  • 1
    @StephanKolassa However, Normal time series admit special methods not available in other cases, because linear combinations of jointly Normal variables are still Normal. In that sense this question differs from the other. – whuber Mar 22 '23 at 19:45
  • Christian, I can't figure out what you're trying to do, because although you indicate a "desired ACF," you consistently show two ACFs, and although you ask about "generating" data, you also refer to data. Could you start out by explaining exactly what your problem is? – whuber Mar 23 '23 at 13:50
  • @Christian, are you simply trying to generate a normally-distributed time series with an ACF of approximately a*exp(b.*lags) for lags <= 30 and approximately 0 for lags > 30? – jblood94 Mar 23 '23 at 14:04
  • @jblood94. Yes, I tried this. But this gives the same result as filtering with only lags <= 30. – Christian Mar 23 '23 at 15:34
  • @whuber: I have a real dataset from a sensor. For a simulation i need to generate the output from this sensor. I modeled the distribution of this sensor (nearly half-gaussian with heavy tails) and the autocorrelation (fist plot shows acf of real data and the modeled acf). In my simulation I want to reproduce approx. the same sensor output (see plot 2 and 3). My process need to be fast, so my current approach is: generate n standard normal distributed samples, add the acf to this timeseries, transform this to uniform samples and transform this finally to my desired distribution. ... – Christian Mar 23 '23 at 15:41
  • @whuber As you can see in the plots 2 and 3. I can generate timeseries (with the approach from jblood94) which is gaussian (I can transform this easy to uniform and then to my desired dist.) and which is autocorrelated. But only with the specific settings (num of lags, reduce sigma manually). If I change this settings, the approach fails. If I have now another sensor with different output behavior which I want to simulate, i have to manually adapt this parameters, which I definitely want to avoid! So my question is, how to estimate the initial sigma and how to automatically adapt the lags. – Christian Mar 23 '23 at 15:46

2 Answers2

3

Theoretical Backgorund

For the sake of simplicity let's assume $\sigma^2=\gamma(0)=1$, where $\gamma(h)$ represents the autocovariance at lag $h$. Let your time series be $\{X_i\}_{i=1}^n$. Consider the covariance matrix of your data

$$\begin{align} \Sigma_X=\begin{bmatrix}1 & \gamma(1) & \gamma(2) & \cdots & \gamma(n-1)\\\gamma(1) & 1 & \gamma(1) & \cdots & \gamma(n-2)\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \gamma(n-1) & \gamma(n-2) & \gamma(n-3) & \cdots & 1\end{bmatrix}\end{align}.$$

For a stationary process (I am assuming you would like your time series to be stationary) this covariance matrix will be positive semi-definite (see https://stat.ethz.ch/Teaching/buhlmann/Zeitreihen/acf.pdf, and Show that the autocovariance function of stationary process {${X_t}$} is positive definite). Now since $\Sigma_X$ is positive semi-definite and symmetric, it will have a symmetric square root such that $$\Sigma_X=\Sigma_X^{1/2}\Sigma_X^{1/2}.$$

Finally we can use the fact that for a vector of random variables $Z$, $\text{Cov}(AZ)=A\Sigma_ZA^T$. So here if $Z$ is a vector of normal random varaibles with $\Sigma_Z=I$ then $$\text{Cov}(\Sigma_X^{1/2}Z)=\Sigma_X.$$

Application in R

So in your case what you would do is

  1. Using the ACF you have calculated, create a $\Sigma_X$ matrix in R.
  2. Calculate $\Sigma_X^{1/2}$ (you could use the sqrtm() function in R).
  3. Generate $Z$ using rnorm() in R, and then multiply $Z$ by $\Sigma_X^{1/2}$ to get your time series with the desired covariance, and hence autocovariance, structure.
David Veitch
  • 1,685
  • We can always generate a $Z$ for a given covariance matrix, but this becomes problematic when the length of $Z$ is large. – Sextus Empiricus Mar 28 '23 at 17:51
  • @SextusEmpiricus yes you raise a good point. The computationally intensive step would be calculating the square root. Although with that said in R I am able to do SVD on a 1000x1000 matrix in only a few seconds, so I think for many problems this solution is workable. I also believe that since $\Sigma_X$ is a Toeplitz matrix there are ways to speed up the SVD step. – David Veitch Mar 28 '23 at 18:09
1
acf    = a*exp(b.*lags)

You can get this type of autocovariance function with an AR1 process $$X_{t} = \phi X_{t-1} + w_{t} $$

where the $w_t$ are independent and identical distributed as $N(0,\sigma^2)$

this gives an acf of the form:

$$\text{ACF}(\text{lag}) = \frac{\sigma^2}{1-\phi^2} \phi^{\text{lag}} $$

(see the question Finding the ACF of AR(1) process)

You can find $\phi$ and $\sigma$ from your $a$ and $b$ if you compare with your function and set $\text{lag} = 0$ and $\text{lag} = 1$

$$\frac{\sigma^2}{1-\phi^2} = a$$ $$\frac{\sigma^2}{1-\phi^2} \phi = a \exp(-b)$$

from which it follows that $\phi = -\exp(b)$ and $\sigma^2 = a[1-\exp(-2b)]$.

  • What stands for in this example? Could you give a code example? I manually set the acf of lag 0 to exactly 1. My problem is, that the acf model is not the same for all my sensor I want to simulate. Sometimes the acf model is described by acf = a*exp(b.*lags) + c*exp(d.*lags). Is it also possible to achieve this acf with an AR 1 process? – Christian Mar 25 '23 at 22:28
  • Or in general can i generate with an ar process timeseries with every kind of acf? (e.g. acf = 1./lags ) – Christian Mar 25 '23 at 22:58
  • @Christian $\phi$ is the parameter in the AR1 model. – Sextus Empiricus Mar 25 '23 at 23:08
  • @Christian if you have two independent pairs $X_0,X_t$ and $Y_0,Y_t$ then $$cov(X_0+Y_0,X_t+Y_t) = cov(X_0,X_t) + cov(Y_0,Y_t) + cov(X_0,Y_t) + cov(Y_0,X_t) = cov(X_0,X_t) + cov(Y_0,Y_t) $$ So you can make those distributions with more complicated acf functions by adding together simpler distributions. – Sextus Empiricus Mar 25 '23 at 23:13
  • like so: `clear close all clc

    Mdl = arima(1,0,0); Mdl.Constant = 0; Mdl.Variance = 1-(0.8^2); Mdl.AR = {0.8};

    Y = simulate(Mdl,1e3);

    Mdl = arima(1,0,0); Mdl.Constant = 0; Mdl.Variance = 1-(0.9^2); Mdl.AR = {0.9};

    X = simulate(Mdl,1e3);

    timeseries = X+Y;`

    – Christian Mar 26 '23 at 00:25
  • How do I calculate the if I have given for example desired acf values for lags 1-30 (which don't follow any model)? is this even possible? – Christian Mar 26 '23 at 00:28
  • @Christian I have updated the question. You can find general models by solving the Yule Walker equations. Potentially some model coefficients will be very small if the order of the autoregressive model that captures the autocovariance structure is very small. In the case where your acf is a sum of exponential terms you can expect the order to be equal to the number of terms (see Sum of autoregressive processes?). – Sextus Empiricus Mar 26 '23 at 09:34
  • so for a model with 2 e-functions the equation is simply as follows?$$ \frac{\sigma^2}{1-\phi^2} = 1; \frac{\sigma^2}{1-\phi^2} \phi = a \exp(1\cdot b) + c \exp(1\cdot d); \frac{\sigma^2}{1-\phi^2} \phi^2 = a \exp(2\cdot b) + c \exp(2\cdot d) $$ – Christian Mar 26 '23 at 10:43
  • @Christian with a sum of two exponential functions you need a model with two parameters. $X_{t} = \phi_1 X_{t-1} + \phi_2 X_{t-2} + w_{t} \qquad w_t \sim N(0,\sigma^2) $ – Sextus Empiricus Mar 26 '23 at 11:06
  • ok, then: $$\phi_0 = 1; \phi_1 = a \exp(1\cdot b) + c \exp(1\cdot d); \phi_2 = a \exp(2\cdot b) + c \exp(2\cdot d)$$ Is this correct? – Christian Mar 26 '23 at 11:48
  • The noise should be white noise, not just $N(0,1)$. – Richard Hardy Mar 26 '23 at 16:41
  • Is there a formula to calculate the coeffitients directly from the ACF values for up to lag p. For the p=2 i found an example. here – Christian Mar 26 '23 at 18:27