A. Linear models
Let's say that $\mathbf{y}=(y_1,\dots,y_n)$ is a vector of observable random variables, and that each $y_i$ is accompanied by a vector $\mathbf{u}_i=(u_{i1},\dots,u_{im})$ of explanatory variables. The random vector $\mathbf{y}$ is modeled by specifying functions of $\mathbf{u}$, say $\delta(\cdot)$, and by assuming that the random deviations $y_i-\delta(\mathbf{u}_i)$ have a joint distribution with certain specified characteristics.
Now let $x_{ij}=\delta_j(\mathbf{u}_i)$, e.g. $x_{i1}=1$ ($\delta_1(\cdot)$ is a constant function), $x_{i2}=u_{i1}$, $x_{i3}=u_{i2}$, $x_{i4}=u_{i1}u_{i2}$, $x_{i5}=u_{i3}^2$, $x_{i6}=\sqrt{u_{i4}}$ etc.
In a linear model we express $\delta(\mathbf{u})$ in the form
$$\delta(\mathbf{u})=\beta_1\delta_1(\mathbf{u})+\beta_2\delta_2(\mathbf{u})+\cdots+\beta_p\delta_p(\mathbf{u})=\mathbf{X}\boldsymbol{\beta}$$
where $\mathbf{X}=\{x_{ij}\}$.
That is, in a linear model we have a linear combination of functions of explanatory variables (even nonlinear functions, of course) and $\boldsymbol{\beta}$ is just a vector of coefficients.
See Harville, Linear Models and the Relevant Distributions and Matrix Algebra, CRC Press, 2018, pages 123-125.
B. Least squares
Given the linear model $\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}$, you can estimate $\boldsymbol{\beta}$ by ordinary least squares:
$$\hat{\boldsymbol{\beta}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\tag{1}$$
You may say that linearity in parameters, i.e. a linear model, implies that it is possible to express the $\hat{\boldsymbol{\beta}}$ as a linear function of $\mathbf{y}$, but not that whatever linear unbiased function is BLUE (see below).
You may say that non linearity in parameters implies that it is not possible to express their estimates as that linear function of the dependent variables, because ordinary least squares require linear models.
C. Gauss-Markov theorem
The Gauss-Markov theorem states that given a linear model, letting $\mathbf{y}-\delta(\mathbf{u})=\mathbf{y}-\mathbf{X}\boldsymbol{\beta}=\boldsymbol{\varepsilon}$, if $E(\boldsymbol{\varepsilon})=\mathbf{0}$, and $\text{var}(\boldsymbol{\varepsilon})=\sigma^2\mathbf{I}$, then the OLS estimator is the best linear unbiased estimator.
To keep it simple, we could say that a linear estimator has the form $\mathbf{Cy}$, where the matrix $\mathbf{C}$ is a function of $\mathbf{X}$. Hence a linear estimator is a linear function of the random vector $\mathbf{y}$. BTW, since a (linear) estimator is a (linear) function of a random vector, it is itself a random vector.
The theorem states that (1) is the best linear unbiased estimator, i.e. that (1) is better than whatever else linear unbiased function of $\mathbf{y}$. Other linear unbiased estimators (not parameters) are not BLUE. For example if $\mathbf{C}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'$ then $\hat{\boldsymbol{\beta}}=\mathbf{Cy}$ is BLUE, if $\tilde{\mathbf{C}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'+\mathbf{D}$ then $\tilde{\boldsymbol{\beta}}=\tilde{\mathbf{C}}\mathbf{y}$ is not BLUE even if it is unbiased.${}^1$
When you think of linearity in parameters, the random vector $\mathbf{y}$ is modeled as a linear combination of the $x$'s, i.e. of functions of the exaplanatory variables, and the parameters are just the (unknown, to be estimated) coefficients of that linear combination.
Linearity in parameters and linear estimators are related but very different matters.
${}^1$. $\hat{\boldsymbol{\beta}}$ is unbiased because $E[\hat{\boldsymbol{\beta}}]=E[(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}]=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'E[\mathbf{y}]=(\mathbf{X}'\mathbf{X})^{-1}(\mathbf{X}'\mathbf{X})\boldsymbol{\beta}=\boldsymbol{\beta}$, i.e. because $\mathbf{CX}=\mathbf{I}$. $\tilde{\boldsymbol{\beta}}$ is unbiased if $\tilde{\mathbf{C}}\mathbf{X}=\mathbf{I}$.