First, I don't think the Arthur-Merlin protocol has to enter the model -- it sounds from the motivation like you just want to produce problem instances quickly so that any algorithm for solving them is slow. In other words, if we could prove that Arthur can produce a hard problem, then there seems to be no need for Merlin to verify that the problem produced is hard.
Very related is this question: Can we generate in polynomial time a language that is not decidable in polynomial time? The answer turns out to be yes iff unary P is not equal to unary NP.
That was for deterministically generating an instance. For randomly generating one, if one-way functions $f$ exist, then we can just pick a uniformly random $x$ and present the problem "here is $f$ and $f(x)$, find $x$".
Hmm, actually, if we are not constrained, we can generate very hard problems: For a problem of size $n$, we ask whether the $n$th Turing Machine halts on a blank input. Or if we have randomness, we pick a number $i$ in $\{1,\dots,n\}$ uniformly at random and ask whether the $i$th Turing Machine halts. So perhaps it makes sense to limit the kinds of problems we want to generate to only be so hard, e.g. in NP.