Which is better to normalize data?

Question

I saw on Coursera machine learning classes that is possible to normalize data in two ways:

data = (data - mean) / max(data) - min(data)

or you can use an Octave function called std(), it does the following:

data = (data - mean) / std(data)

Which normalization is better to normalize a matrix containing house size(20 m² to 1000 m²), number of rooms(2 to 20) and the house's prices(10000 to 15000000)? And why its is better? I'm using linear regression to predict a house price, with size and number of rooms as parameters. Although, then a try to plot it using plot function in Octave, but it gives an error saying that the values are too high. So, if I normalize my data using both approaches I can plot it. So, which is better and I? When I should use std() or the other approach?

Could you explain why you want to "normalize" this matrix? What are you going to do with the matrix that you think requires some such operation? — whuber, Mar 18 '16 at 18:53
std() in your notation doesn't change the data; it just calculates a summary measure. The first operation, in the usual interpretation of the equation, just changes the scale and flushes out the units of measurement. Otherwise it does nothing fundamental to affect any regression. — Nick Cox, Mar 18 '16 at 19:37
I'm sorry, I edited my question. Now std() does something to the data. — Pasdf, Mar 19 '16 at 02:30
I agree with Nick Cox. The purpose of normalization before using linear regression is to make the ranges of two data set(size and number of rooms) similar. For this purpose, the first normalization guarantees that, while the second doesn't. — hopflink, Mar 19 '16 at 03:16

Which is better to normalize data?

0 Answers0