1

I have two multidimensional datasets $X, Y$ of dimensions $m \times n$. Here $m$ is the successive measurements and $n$ is the data collected during each measurement. We can say each of $m$ are independent of each other but the $n$ features of the measurements are correlated.

My goal is to find the mutual information of $X, Y$ which are as well correlated with each other.

My approach is to use quantile transform and transform each of the $n$ columns into normal distributions. Since it is a linear transform, the mutual information should not be lost. Is this assumption correct?

Then in this paper, titled "Mutual Information Is Copula Entropy" https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6077935, the authors prove that

$I(r) = -H_c (r)$, where $r$ are iid RVs.

Can I use this to find $I(X,Y)$? First transforming each individual columns $n$ of $X, Y$ into normal distribution by quantile transform and then find the Copula densities of the concatenated columns of $X, Y$ which now has the size $m \times 2n$ each of the columns are Gaussian.

$f_{X_1,..X_n,Y_1,..Y_n}(x_1,..x_n,y_1,..y_n) = C(f_{X1}(x_1),...f_{Xn}(x_n),f_{Y_1}(y_1),...f_{Y_n}(y_n)$

Then find

$I(X,Y) = H_c(x_1,..x_n,y_1,..y_n)$?

I am wondering that since Copula is non linear transform, does it preserve the mutual information? And I would also like to know if this overall approach is correct? Am I not considering something?

amitha
  • 117

0 Answers0