0

I was looking to create a confidence ellipse for my X and Y variable in order to identify potential outliers.

I'm new to this area so my understanding and use of this method may be wrong (please advise if it is).

I am using the housing dataset of kaggle https://www.kaggle.com/c/house-prices-advanced-regression-techniques, and am trying to plot a confidence interval for SalePrice and GrLivArea.

Although this is quite a specific example, my question is more of a general one.

According to Wikipedia: "the chi-square distribution (also chi-squared or χ2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables."

Seeing as the size of the confidence interval is defined by the 2 degrees of freedom chi-squared statistic, because my 2 variables are correlated, and therefore not independent, am i able to use this chi-squared distribution to generate a confidence ellipse?

Sean
  • 634
  • 1
    Andrew Gelman has some thoughts on outlier detection that are worth reading: https://statmodeling.stat.columbia.edu/2014/06/02/hate-stepwise-regression/ – Dave Aug 12 '20 at 21:23
  • 1
    The chi-square confidence ellipse does indeed allow for correlation. You can get a chi-square distribution from correlated variables, and this is automatically done in the ellipse by use of Mahalanobis distance. So no worries. – BigBendRegion Aug 19 '20 at 12:02
  • See https://stats.stackexchange.com/questions/67422/volume-of-the-95-confidence-ellipsoid, https://stats.stackexchange.com/questions/573021/confidence-intervals-for-more-than-one-variable, https://stats.stackexchange.com/questions/8504/help-in-drawing-confidence-ellipse – kjetil b halvorsen Dec 14 '22 at 19:47

0 Answers0