I need to understand the kernel trick in order to understand methods like KRR and GPR for machine learning and I think I am getting too confused over some very basic questions. I have read in various resources (like https://doi.org/10.1016/B978-0-323-90049-2.00009-3) that one motivation behind using kernel methods is to enable a linear model to fit a non-linear relation between the input and output variable. In order to achieve this, one can either explicitly transform the input vector into a higher dimensional vector (e. g. doing a second-order polynomial mapping to fit a quadratic relation) or one can use the kernel trick to avoid this explicit transformation.
My first problem is: Is polynomial regression equivalent to multiple linear regression with the input data transformed by polynomial mapping ? I'm asking this because I am getting caught up with: ''Fitting a non-linear function with a linear model.''
Moreover, if these two approaches are equivalent this should mean that polynomial regression is a linear model. What then would be examples for non-linear ones ?
My second problem is: If there is inherent non-linearity in the observed data, why even use a linear model ? I have the feeling that I don't quite get what is meant by linear/non-linear and statistical model/function in this particular context.