Online r2 calculation does not match sklearn r2 calculation (python)

Question

I am trying to replicate sklearn's linear regression coefficients and r2 score with an online calculation (so that it updates with each additional point of data). Starting with this code here.

class SimpleLinearRegressor():
def __init__(self):

    self.dots = np.zeros(5)
    self.intercept = None
    self.slope = None

    self.tss = 0
    self.rss = 0
    self.r2 = 0

    self.count = 0
    self.y_sum = 0
    self.y_avg = 0

def update(self, x: np.ndarray, y: np.ndarray):

    self.dots += np.array(
        [
            x.shape[0],
            x.sum(),
            y.sum(),
            np.dot(x, x),
            np.dot(x, y),
        ]
    )
    size, sum_x, sum_y, sum_xx, sum_xy = self.dots
    det = size * sum_xx - sum_x ** 2

    self.count += 1
    self.y_sum += y
    self.y_avg = self.y_sum / self.count

    if det &gt; 1e-10:  # determinant may be zero initially

        self.intercept = (sum_xx * sum_y - sum_xy * sum_x) / det
        self.slope = (sum_xy * size - sum_x * sum_y) / det

        self.tss += ((y - self.y_avg) ** 2).sum()

        resid = y - (self.intercept + (x * self.slope))
        self.rss += (resid ** 2).sum()
        self.r2 = 1 - (self.rss / self.tss)

So far the coefficients are spot on. However, the r2 calculation is consistently higher than sklearn's r2 calculation. Here is a comparison (calculating at each new point of the data):

Here's the original data and table:

0: line with some slight noise
score: sklearn r2 score
coef: sklearn coef
coef_incr: online coef
score_incr: online r2

Thank you

It is the average of the true y variables (dependent variables). — sam chakerian, Jan 12 '23 at 08:09
I’ve not yet figured out everything, but my discussion here might lead you somewhere. I think it comes down to you and sklearn not quite agreeing on what $R^2$ should be. I’ll see what sense I can make out of this, and I hope to post an answer. (But if you figure it out, please do post a self-answer!) — Dave, Mar 07 '23 at 17:54

Online r2 calculation does not match sklearn r2 calculation (python)

0 Answers0