How does RStudio 'lm' know my data is a polynomial equation?

Question

I'm following along with a "learn R" course, and we ran the following multiple linear regression:

regressor = lm(formula = Profit ~ ., data = training_set)

Then a bit later, we added some columns to a different data set:

dataset$Level2 = dataset$Level^2
dataset$Level3 = dataset$Level^3
dataset$Level4 = dataset$Level^4

And ran identical line...

regressor = lm(formula = Profit ~ ., data = training_set)

And somehow now lm knows not to fit a multiple regression linear line?

It's modifying the original data set before it's even passed into lm, so unless lm is somehow trying to interpret my data to figure out which actual model to use, I don't see how it's knowing to do this...?

A linear model refers to a model which is linear in the parameters. It doesn't matter that some of your regressors are powers of each other, that is still a linear model. There is no special consideration that lm needs to make for this. — Chris Haug, Oct 16 '20 at 16:29
Ok true, I understand that part I probably used the wrong terminology (because "linear" references the x* term and not the x^2). But in a simple or multiple linear regression its a best fitting "straight line", solved for either by gradient descent or analytically, right? So how does, in this case, it know to use some other method to make a model who's function generates a curved line instead of straight. I can only imagine it is using some strategy to fit the line that isn't used in a simple or multiple linear regression? — Tallboy, Oct 16 '20 at 16:31
I'm wondering what that extra method is called... and furthermore (the purpose of this question), how R knows to use that function just based on my data, rather than an actual fitted straight line, when I didn't actually tell it to do that. — Tallboy, Oct 16 '20 at 16:35
The terminology is not a minor detail here: there is mathematically absolutely no difference between a model in which you have these powers or not, you still have a linear model. There is no "extra method". — Chris Haug, Oct 16 '20 at 16:38
@ChrisHaug Hmm... Isn't the lm formula "solving" for the slope and y intercept of the regression line using some method? In certain dataset inputs, the line is straight, as if it were solving y = mx + b and other times, the outputted line is not straight (including x^2 + x^3 in the data etc), I'm wondering what mechanism is making this happen? Why are some model lines straight (even with multiple independent variables, like in a "multiple linear regression"), and then a curved line is output for other datasets? — Tallboy, Oct 16 '20 at 16:47
I know your question isn't about confidence intervals and the duplicate is: nevertheless, your question appears to be fully answered in the duplicate thread. — whuber, Oct 16 '20 at 17:45
Ah I see, it didnt show up until I actually had to refresh. Thanks Ill have a look — Tallboy, Oct 16 '20 at 17:45

How does RStudio 'lm' know my data is a polynomial equation?

0 Answers0