I think that this article answers your question. This kind of models aims at minimising some squared deviations (objective: give the optimisation an incentive to create a model with better prediction accuracy), with constraints (penalty) applied to the norms of the estimated parameters:
$$\min_{\beta\in\mathbb{R}^k} \frac{1}{n}\vert\vert Y-X\beta\vert\vert^2_2+\lambda \vert\vert\beta\vert\vert ^p_p$$
with the norm being defined as a function of $p$ as:
$$\vert\vert\beta\vert\vert_p=\left(\sum_{j=1}^{k}\vert\beta\vert^p\right)^\frac{1}{p}$$
Applying this kind of penalty on the norms of the parameters allows to highlight the importance of the most important regressor, and, with some $p$, to exclude the least important features from the regressors.
Note that LASSO corresponds to $p=1$ and Ridge regression corresponds to $p=2$, i.e. they belong to the same category of model, but use norms with different definitions, which explain why their purposes/results are not the exactly the same even if they share the same general purpose.
Explanation of a norm with $p=2$:

Start with a vector $b=(2, 1)$ and draw it on the plan. By definition, it has 2 elements, 2 coordinates in the plan. Using the norm $p=2$ requires to square each component. It distorts the vector. When you sum these transformed coordinates, you get 5. If you use the inverse function that you used to transform the vector, you get the norm, in that case $\sqrt{5}$. Here, you recognise the application of the Pythagorean theorem. Actually, the norm with $p=2$ is simply the length of a vector in traditional geometry. With $p=1$, you would simply get the sum of the lengths of all the components.