How can I compute autocorrelation values of end to end vector?

Question

I obtained a list of $\overrightarrow{r}_{end-to-end}$ from a Monte Carlo simulation of polymer movement.

#   r_end-X     r_end-Y    r_end-Z
-177.236 100.309 -130.930
-184.354 88.047 -117.760
-172.577 87.168 -117.745
-197.651 103.270 -124.953
-190.053 104.223 -128.100
-187.985 102.387 -127.593
-190.839 91.210 -118.643
-193.851 98.069 -113.333
-177.084 83.960 -116.667
-178.312 92.759 -128.782
... ... ... ... ...

I want to compute autocorrelation using dot product vector[i] dot vector[i+tau] and normaization, then fit an exponential decay curve, and finally obtain the best fit values (fitx, fity).

What formula should I use?

Here's how I adapted the autocorrelation function for the vector dataset:

Calculate the mean vector $\boldsymbol{\mu}$ by averaging each component of the vectors separately:

$$ \boldsymbol{\mu} = \left( \frac{1}{n}\sum_{i=0}^{n-1} x_i, \frac{1}{n}\sum_{i=0}^{n-1} y_i, \frac{1}{n}\sum_{i=0}^{n-1} z_i \right) $$

Calculate the autocorrelation function for the vector dataset using the dot product to get a scalar autocorrelation value for each lag $t$:

$$ R(t) = \frac{1}{(n - t) \cdot \sigma^2} \sum_{i=0}^{n-t-1} \left( \mathbf{X}_i - \boldsymbol{\mu} \right) \cdot \left( \mathbf{X}_{i+t} - \boldsymbol{\mu} \right) $$

where $\mathbf{X}_i$ is the vector at index $i$, and $\sigma^2$ is the variance of the magnitude squared of the end-to-end distance vectors. The variance in this case can be calculated as:

$$ \sigma^2 = \frac{1}{n} \sum_{i=0}^{n-1} \left( \| \mathbf{X}_i - \boldsymbol{\mu} \|^2 \right) - \left( \| \boldsymbol{\mu} \|^2 \right) $$

Here, $\| \mathbf{X}_i - \boldsymbol{\mu} \|^2$ is the squared magnitude of the vector difference $\mathbf{X}_i - \boldsymbol{\mu}$.

In the autocorrelation function, the dot product in the summands will give us a scalar value, as the dot product of two vectors is a scalar. This is appropriate since I am interested in the correlation of the scalar magnitudes of the end-to-end vectors.

Now, the correction to the formula is in how $\sigma^2$ is interpreted. In the scalar case, this is simply the variance of the dataset, but for vectors, you're dealing with magnitudes, so you need to find the average of the squared distances from the mean vector, as shown above.

However, I am not sure about this formula at all.

lightxbulb · Answer 1 · 2024-02-11T11:27:00.413

I guess it depends on how you define your autocorrelation function. I have no issue as to how you calculate an estimate of the mean: $$E[X_j] \approx \mu = \frac{1}{n}\sum_{i=1}^n X_i \in\mathbb{R}^3,$$ since this is supposedly a stationary process.

The most general form of the variance of a $d$-dimensional random vector $Y$ is given by the variance matrix $var(Y)= E[(Y-E[Y])(Y-E[Y])^T]$. $var(Y)$ is a symmetric positive (semi-)definite matrix. This means that it can be decomposed as $V\Lambda V^T$. Then the variance along each eigenvector is the corresponding eigenvalue from $\Lambda$. Sometimes people call the determinant or the trace of this matrix the variance in which case you get a scalar. Notably the trace variance is equivalent to $$E[\|Y-E[Y]\|^2] = E[(Y-E[Y])^T(Y-E[Y])] = Tr(E[(Y-E[Y])(Y-E[Y])^T]) .$$ You can build the covariance matrix if you have two different random vectors $Y$ and $Z$ as: $cov(X,Y) = E[(Y-E[Y])(Z-E[Z])^T]$. Similarly scalar variants can be derived as the determinant and trace.

The Pearson correlation coefficient is defined as $\rho(Y,Z) = \frac{cov(Y,Z)}{\sqrt{var(Y)}\sqrt{var(Z)}}$. For $Y$ and $Z$ vectors that is ambiguous, however, as the expressions above are matrices. I would argue that $$\rho(Y,Z) = (var(Y))^{-1/2} cov(Y,Z) (var(Z))^{-1/2}$$ makes sense in this case as a generalization since the left and right spaces are in some sense matched. This would turn out to be irrelevant for your specific definition, but I'll handle the general case nonetheless.

Now let $Y_t, Z_s$ be a parametrized family of random variables, then the covariance function is the covariance $cov_{YZ}(t,s) = cov(Y_t, Z_s)$. Note that this is a matrix for each $(t,s)$ if $Y_t$ and $Z_s$ are vectors. Similarly, you can generalize the Pearson correlation to a function of time $\rho_{YZ}(t,s) = \rho(Y_t, Z_s)$. The auto-covariance just takes $Y=Z$ so $cov_{YY}(t,s) = cov(Y_t, Y_s)$, and similarly for the Pearson auto-correlation $\rho_{YY}(t,s) = \rho(Y_t, Y_s)$. If you have a stationary process then you can make this a function only of the difference in time $\rho_{YY}(\tau) = \rho(Y_{t+\tau}, Y_t)$.

Now if I had $n$ sampled times $t_1, \ldots, t_n$ I can compute the Pearson auto-correlation function at those $\rho_{YY}(t_j-t_i) = \rho(Y_{t_i}, Y_{t_j})$ yielding me a matrix. Of course $\rho$ was defined in terms of variance and covariance matrices, which result from expectations, so in general you would have to produce estimators for those and use samples $Y^{1}_{t_i}, \ldots, Y^{N}_{t_i}$ to estimate those, for example: $$var(Y_{t_i}) = E[(Y_{t_i}-E[Y_{t_i}])(Y_{t_i}-E[Y_{t_i}])^T] \approx \frac{1}{N} \sum_{k=1}^{N} (Y^k_{t_i} - \tilde{\mu}_i)(Y^k_{t_i} - \tilde{\mu}_i)^T.$$ Potentially you may want to modify your estimator, e.g. with Bessel's correction.

The estimator that you wrote in 2 seems to be going for some simpler version of this, namely: $$\rho_{XX}(\tau) = \frac{E[(X_{t_i}-E[X_{t_i}])^T(X_{t_j}-E[X_{t_j}])]}{\sqrt{var(X_{t_i})var(X_{t_j})}}.$$ But since you use the scalar trace version of the covariance I would assume you want the scalar trace version for the variance too. This means $$var(X_{t_i}) = E[\|X_{t_i} - E[X_{t_i}]\|^2].$$ I came to this conclusion just from the requirement that your covariance formulation ought to be consistent with the variance formulation. If you were to use the det version ditto, if you were to use the matrix version then you would also use the matrix version for the variance.

Since you have a stationary process with variance independent of time you can build the following estimator for the variance:

$$var(X_i) \approx \sigma^2 = \frac{1}{n} \sum_{i=1}^n \|X_i - \mu\|^2.$$

However, because you're using an estimator for $E[X_i]$ to get $\mu$ then you would apply Bessel's correction for an unbiased estimate: $\sigma^2_B = \frac{n}{n-1} \sigma^2$ (you wouldn't need that if you used separate samples to estimate the mean, which you then throw away). Additionally, the expressions in Pearson's correlation are for the standard deviations. To get an unbiased estimator of this is even more problematic unless you know something about the distribution. But in your case the standard deviations agree, so you get back the variance once you multiply them, so it's probably fine to just estimate the variance once with Bessel's correction.

Your answer is good. However, I implemented the formula and found that it doesn't work. So, I can give you the bounty points, but I won't accept it as an answer. — user366312, Feb 12 '24 at 06:00
@user366312 How did you conclude that it doesn't work? You could expand your question with that information. Maybe I misunderstood something, or something else is broken. — lightxbulb, Feb 12 '24 at 06:04

How can I compute autocorrelation values of end to end vector?

1 Answers1