One option are Lambert W random variables (skewed, heavy-tailed), which can be parameterized either as $f(y \mid \mu_x, \sigma_x, \gamma)$ or $f(y \mid \mu_x, \sigma_x, \delta_{\ell}, \delta_r, \alpha)$, respectively (Disclaimer: I am the author of these, so I am biased on whether they are interpretable or not -- I find them def much more interpretable than a asinh() function ;) ).
As you care about 3rd and 4th moments, double heavy-tailed Lambert W x Gaussian (or Tukey's h / hh as special case) might be useful to look at. They arise as a non-linear transformation of $N(\mu_x, \sigma_x^2)$ random variable $X$ to (setting $\alpha = 1$ for simplicity)
$$
Y = \mu_x + \sigma_x \cdot \left( U \exp\left(\frac{\delta}{2} \cdot U^{2}\right) \right), \quad U := \frac{X - \mu_x}{\sigma_x} \sim N(0, 1)
$$
It can be extended to a skewed version, by allowing $\delta$ to be different for the left side ($X < \mu_x$) vs the right side ($X > \mu_x$); hence $\delta \rightarrow (\delta_l, \delta_r)$. Clearly, $Y \sim N(\mu_x, \sigma_x^2)$ if $\delta = 0$.
The interpretation is that there is a latent process $X$ that is Gaussian; however, we only observe & measure the extreme skewed / heavy-tailed version of it through $Y$. As an example take the stock market: here you could think of $X$ as "news" occurring in the world (Gaussian), but we can only observe / measure them through the lens of collective market actions -- and as we know people freak out over unlikely events (adding heavy-tails); and people react more extreme to negative news than to positive ones (adding skewness). This collective response is captured via $\delta_l$ and $\delta_r$ parameters, which push events far from the mean even further away (generating heavy tails). Obviously, this should not be taken as a literal explanation of the market, but as (one) interpretation (see Table 4 & Figure 7 for an illustration on SP500 returns).
The distribution of $Y$, $f(y \mid \mu_x, \sigma_x, \delta_l, \delta_r)$, has the properties you request in 1. & 2. (set $\delta \equiv 0$) and 3. (see Eq. (23) here); re 4.: I assume you mean that you want to exclude pathological cases that are theoretically interesting, but practically useless. For that matter several applications in the original papers as well as several posts here illustrating applications of it with simulations and real world examples (How to transform data to normality?, How to transform leptokurtic distribution to normality?, Transformations to approximate normality with high kurtosis data) should suffice.