I obtained a list of $\overrightarrow{r}_{end-to-end}$ from a Monte Carlo simulation of polymer movement.
# r_end-X r_end-Y r_end-Z
-177.236 100.309 -130.930
-184.354 88.047 -117.760
-172.577 87.168 -117.745
-197.651 103.270 -124.953
-190.053 104.223 -128.100
-187.985 102.387 -127.593
-190.839 91.210 -118.643
-193.851 98.069 -113.333
-177.084 83.960 -116.667
-178.312 92.759 -128.782
... ... ... ... ...
I want to compute autocorrelation using dot product vector[i] dot vector[i+tau] and normaization, then fit an exponential decay curve, and finally obtain the best fit values (fitx, fity).
What formula should I use?
Here's how I adapted the autocorrelation function for the vector dataset:
- Calculate the mean vector $\boldsymbol{\mu}$ by averaging each component of the vectors separately:
$$ \boldsymbol{\mu} = \left( \frac{1}{n}\sum_{i=0}^{n-1} x_i, \frac{1}{n}\sum_{i=0}^{n-1} y_i, \frac{1}{n}\sum_{i=0}^{n-1} z_i \right) $$
- Calculate the autocorrelation function for the vector dataset using the dot product to get a scalar autocorrelation value for each lag $t$:
$$ R(t) = \frac{1}{(n - t) \cdot \sigma^2} \sum_{i=0}^{n-t-1} \left( \mathbf{X}_i - \boldsymbol{\mu} \right) \cdot \left( \mathbf{X}_{i+t} - \boldsymbol{\mu} \right) $$
where $\mathbf{X}_i$ is the vector at index $i$, and $\sigma^2$ is the variance of the magnitude squared of the end-to-end distance vectors. The variance in this case can be calculated as:
$$ \sigma^2 = \frac{1}{n} \sum_{i=0}^{n-1} \left( \| \mathbf{X}_i - \boldsymbol{\mu} \|^2 \right) - \left( \| \boldsymbol{\mu} \|^2 \right) $$
Here, $\| \mathbf{X}_i - \boldsymbol{\mu} \|^2$ is the squared magnitude of the vector difference $\mathbf{X}_i - \boldsymbol{\mu}$.
In the autocorrelation function, the dot product in the summands will give us a scalar value, as the dot product of two vectors is a scalar. This is appropriate since I am interested in the correlation of the scalar magnitudes of the end-to-end vectors.
Now, the correction to the formula is in how $\sigma^2$ is interpreted. In the scalar case, this is simply the variance of the dataset, but for vectors, you're dealing with magnitudes, so you need to find the average of the squared distances from the mean vector, as shown above.
However, I am not sure about this formula at all.