Questions tagged [standardization]

Usually refers to "z-standardization" which is shifting and rescaling data to assure they have zero mean and unit variance. Other "standardizations" are possible, too.

Specifically, when $(x_i), i=1, \ldots, n$ is a batch of data, its mean is $m=(\sum_i x_i)/n$ and its variance is $s^2 = v=(\sum_i(x_i-m)^2)/\nu$ where $\nu$ is either $n$ or $n-1$ (choices vary with application). Standardization replaces each $x_i$ with $z_i = (x_i-m)/s$.

Do not confuse standardization with or .

840 questions
21
votes
4 answers

What's the difference between standardization and studentization?

Is it that in standardization variance is known while in studentization it is not known and therefore estimated? Thank you.
58485362
  • 211
  • 1
  • 2
  • 3
12
votes
2 answers

What is the reasoning behind standardization (dividing by standard deviation)?

Why does dividing a dataset by sigma make the sample variance equal to 1? Assuming a zero mean for simplicity. What's the intuition behind this? Dividing by the range (max-min) makes intuitive sense. But standard deviation does not.
alwayscurious
  • 443
  • 3
  • 10
4
votes
2 answers

For what kind of features will standardization be helpful?

I have found that for some datasets, mean removal and variance scaling helps to fit a better model to data while for some datasets this does not help. On what kind of data standardization will be helpful? Are there some guidelines for applying this?
Chauhan
  • 45
  • 6
3
votes
1 answer

Does standardizing the covariates make the design matrix orthogonal

In many applications, it is common to standardize the covariates. This means to center each covariate at its mean and divide by its standard deviation. Mathematically, how does this affect the design matrix $X$? Suppose $X_s$ is the standardized…
Adrian
  • 2,869
  • 5
  • 32
  • 53
3
votes
0 answers

When not to use standardization on variables

Can someone give me a counter example of when we should not use standardization on variables? I understand what standardization is but i am not getting the point why we need to standardize variables? I seached for many sites but can't find an…
Khan Saab
  • 225
  • 1
  • 4
  • 12
3
votes
1 answer

How to standardize a data-set

I have a data-set consisting of N p-dimensional observations (all quantitative variables). I want to apply a hierarchical clustering algorithm to those data. As explained on page 505 in Elements of Statistical Learning, when using weighted average…
3
votes
1 answer

How to standardize data

I have the test scores of two groups, say A, and B. And the former consists of 186 individuals whereas the latter only has 100. The test scores range from 1 to 12, and because group A has more people, it obviously has a total higher score than group…
Adrian
  • 2,869
  • 5
  • 32
  • 53
2
votes
0 answers

What are the pros and cons of standardizing variable in presence of an interaction?

I put this question because while reading the benefits of standardizing explanatory variables or not, I read good but contrasting opinions about standardizing when there are interaction in the model. Some talk about how problems of collinearity…
2
votes
1 answer

Different possible parameter combinations to obtain a standardized generalized hyperbolic distribution?

Consider the generalized hyperbolic distribution given by (from wikipedia): So I now wanted to derive the standardized version, so mean zero variance one. I wanted to do the following: Set the mean to zero and the variance equal to one. Rewrite…
Stat Tistician
  • 2,321
  • 5
  • 34
  • 57
2
votes
3 answers

centering and scaling (standardizing) a variable: use population or sample standard deviation?

For centering and scaling a variable (e.g. prior to a regression, or to a visualization), the standard procedure, of course, is to subtract the mean then divide by the standard deviation. But is it considered preferable to use the population…
2
votes
1 answer

create indices by standardising variables

I have 257 variables, which influence behaviour altruism of humans. They can be assigned to the 3 following criteria: internal, external or selection. I check the influence of the individual variable on altruism, further I need to check the…
2
votes
0 answers

When to standardize data

I have a big dataset of sensor data (body temperature) acquired on a lot of people and each person is assessed for several days. person1 = [temp_day1, temp_day2, .... , temp_dayn] During each day the temperature is assessed 24 times (1 every…
gabboshow
  • 673
2
votes
3 answers

What is a good paper or book to understand standardization and normalization of data with different units of measurement?

I am dealing with data with different units of measurement for NYC neighborhoods and I am trying to build a composite score with it. For example, I have total population by neighborhood, mean income and children as a percentage of the population.…
Manuel Q
  • 153
  • 8
2
votes
0 answers

standardization to obtain unit variance

I've come across some papers in where certain forecast errors are standardized to have unit variance. Unfortunately that's the only information they provide and I have no idea on how to obtain/calculate their results. Assuming I have a vector of 3…
Gritti
  • 161
1
vote
1 answer

Confused about standardization

I read a lot on the web, but I am still not sure whether I understood completely when we standardize the data (so that it is zero mean unit variance). So, let's say that I have a set of genes and expression levels of those genes among different…
user5054
  • 1,549
1
2 3