I have two multidimensional datasets $X, Y$ of dimensions $m \times n$. Here $m$ is the successive measurements and $n$ is the data collected during each measurement. We can say each of $m$ are independent of each other but the $n$ features of the measurements are correlated.
My goal is to find the mutual information of $X, Y$ which are as well correlated with each other.
My approach is to use quantile transform and transform each of the $n$ columns into normal distributions. Since it is a linear transform, the mutual information should not be lost. Is this assumption correct?
Then in this paper, titled "Mutual Information Is Copula Entropy" https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6077935, the authors prove that
$I(r) = -H_c (r)$, where $r$ are iid RVs.
Can I use this to find $I(X,Y)$? First transforming each individual columns $n$ of $X, Y$ into normal distribution by quantile transform and then find the Copula densities of the concatenated columns of $X, Y$ which now has the size $m \times 2n$ each of the columns are Gaussian.
$f_{X_1,..X_n,Y_1,..Y_n}(x_1,..x_n,y_1,..y_n) = C(f_{X1}(x_1),...f_{Xn}(x_n),f_{Y_1}(y_1),...f_{Y_n}(y_n)$
Then find
$I(X,Y) = H_c(x_1,..x_n,y_1,..y_n)$?
I am wondering that since Copula is non linear transform, does it preserve the mutual information? And I would also like to know if this overall approach is correct? Am I not considering something?