For centering and scaling a variable (e.g. prior to a regression, or to a visualization), the standard procedure, of course, is to subtract the mean then divide by the standard deviation.
But is it considered preferable to use the population standard deviation (i.e. divided by n) or the sample standard deviation (divided by n-1)? Does it depend on one's use?
Interestingly, the standard R and Python functions seem to make different choices here. Python's sklearn.preprocessing.scale() uses population standard deviation; R's scale() uses sample standard deviation.
(NOTE: there's a prior question here, but it pertains to a very specific psychological method, and the one answer isn't actually substantiated by anything.)
df=N, with sample's variance, withdf=n. Neither are an estimate of the parameter: the 1st is parameter, the 2nd is pure statistic. We may use the sample's variance as the estimate of population one, but it is biased; it is called maximum likelihood estimate of variance. – ttnphns Nov 28 '16 at 08:16df=n-1and is unbiased estimate of population variance. That "sample" or unbiased estimate one shouldn't be confused with the above sample's variance. – ttnphns Nov 28 '16 at 08:16