1

I am interested in computing the $R^2$ between a set of points $D_f = \{ (x,y)\} $ where $y = f(x)$ and a set of points $D' = \{(x',y') \}$ obtained adding noise to $D_f$.

I don't think I can use: $$ R^2 = 1 - \frac{\sum_i (y'_i - y_i)^2}{\sum_i (y'_i - \bar{y'})^2} $$ because noise can be added to the $x$ coordinate as well.

More specifically, I am interested in computing the $R^2$ to compare MIC (Maximal Information Coefficient) as in "Detecting Novel Associations in Large Data Sets" Reshef et al.

Simone
  • 7,078

1 Answers1

1

According to Kinney and Atwal in "Equitability, mutual information, and the maximal information coefficient" the $R^2$ is computed as $r(D_f,D'_f)^2$ where $r$ is the Pearson correlation coefficient.

According to Wikipedia $r^2 = R^2$ ($R^2$ defined in the question above) in a least squares regression analysis. Given that all the points are generated according a function $f$, I think the result might hold true here as well. In other words, it holds true if $f$ is what we would obtain via regression analysis.

I had a doubt about the effect of noise added to $x$ coordinate. Noise added to $x$ changes $y'$ because of the generation process of $D'_f$. Indeed, $y' = f(x + \eta_x) + \eta_y$, where $\eta_x,\eta_y$ is the added noise.

Simone
  • 7,078