One method to examine "robustness" is to do a simulation analysis
Sometimes we have a statistical model and its estimators for some type of data, and we want to see if the estimators are "robust" to some underlying assumption in the model. For example, our estimators might be MLEs formed from an assumed parametric model that specifies a particular distributional family for the data. Typically, what we will do is to generate a range of simulated datasets (some of which use the assumed model and some of which depart from it) with known properties and then compute the estimates from each method for each simulated dataset. By generating a large number of simulations, we can examine the distribution of the estimation error under the simulation model. In particular, we can see the distribution of the estimation error when the model is correctly specified (which is often something we can derive analytically anyway) and we can see the distribution of the estimation error when the model is misspecified.
There is no single metric that defines "robustness" of an estimation method, but there are some obvious ways to summarise this. One candidate for quantifying robustness would be to look at the relative standard deviations of the estimation error under a misspecified model compared to a correctly specified model. For example, suppose your model assumes a parameter $\phi = \phi_0$ but you can create a generalised (misspecified) model with a free parameter $\phi$ to examine performance under misspecification. Suppose you generate $K$ simulations under the correctly specified model and an alternative model with $\phi = \phi_1$ (using some fixed number of simulated datapoints $n$) and let $\epsilon_{k,n,0}$ and $\epsilon_{k,n,1}$ represent the estimation error under the $k$th simulation for these two models. Then we could quantify robustness by:
$$R_{n} = \sqrt{\frac{\sum_{k=1}^K \epsilon_{k,n,1}^2}{\sum_{k=1}^K \epsilon_{k,n,0}^2}},$$
with a lower value being indicative of greater robustness and a higher value being indicative of greater sensitivity with respect to specification of $\phi$ (using a simulation sample size $n$). This ratio would depend on a lot of things in your specified simulation analysis; it would depend on the number of simulated data points, the number of simulations, and the particular models used for the comparison. Nevertheless, if you are willing to compare cases like this, this quantity would give a reasonable idea of how robust the estimation method is under misspecification.
Now, obviously you can extend this to a comparison of multiple models and estimation methods if you wish. To do that you would simply compute the estimates of the parameters under each of the different models and look at the distributions of estimation errors in each case. We would typically expect to find that estimation methods that assume a particular parametric model form (e.g., MLEs) will tend to perform well when the model is correctly specified, but their performance will deteriorate when using an alternative model that constitutes a misspecification. Nonparametric estimators will often tend to perform less well than the MLE under the first model and better under the second. (I won't make any comment here about the performance of estimation in GMMs, save to say that I echo the comment that exhibits scepticism of their robustness.)
An example: As an example of this, suppose I was interested in examination of the Gaussian linear regression model using OLS estimation. This model assumes a normal distribution for the error terms, which has a fixed kurtosis (the normal distribution is mesokurtic irrespective of the parameter values). Suppose I want to know if the estimators in the model are sensitive to misspecification of the kurtosis in the error term. To test this, what I would do is to form a new regression model using a different error distribution that allows variable kurtosis (e.g., the generalised error distribution) and I would generate a large set of simulated datasets from this model, with varying values for the kurtosis parameter. I would probably generate some simulations with a small kurtosis, some mesokurtic, and some with a large kurtosis. In each case, I would know how I generated the simulated data, so I would know the true value of the kurtosis in the simulation model and the true values of the regression parameters.
After generating the simulations, I would use OLS estimation to estimate the regression parameters and I'd compare the estimates to the true parameters to compute the estimation error. I would run this lots of times (say, a million times for each simulation model) and this would give me simulated estimation errors under each simulation model. I could then look at the resulting distributions of the estimation error and see if the error gets big when the kurtosis in the model is heavily misspecified (i.e., far from a mesokurtic distribution). If the estimators perform well even when the kurtosis is misspecified then I would say that the OLS method is "robust" to misspecification of the kurtosis in the error distribution; contrarily, if the estimators perform badly when the kurtosis is misspecified then I would say that the OLS method is "sensitive" to misspecification of the kurtosis in the error distribution. If I wanted to back this up with a single quantification of the "robustness", I would probably look at the ratio of standard deviations of the estimation error (i.e., the quantity above) comparing the misspecified and correctly specified models.