How did authors of statistical hypothesis tests come up with their statistics?
There are numerous ways to identify test statistics, depending on circumstances. It's important to try to identify the alternatives you see as important to pick up and try to get some power against those, under some plausible set of assumptions.
If you have a hypothesis relating to population means (in fact, let's make it simple and consider a one-sample test), for example, a statistic based on the sample mean would seem like an obvious choice for a statistic, since it will tend to behave differently under the null and the alternative. However (for example), if you're looking at shift-alternatives for a Laplace / double-exponential family ($\text{DExp}(\mu,\tau)$), something based on the sample median would be a better choice for a test of a shift in mean than something based on the sample mean.
If you have a specific parametric model (based on some particular distribution-family), it's common to at least consider a likelihood ratio test, since they have a number of attractive properties for large samples.
In many situations where you're trying to design a test from scratch, a test statistic will be based on a pivotal quantity. The test statistic in a one-sample t-test (as well as with many other tests you may have seen before) is a pivotal quantity.
Given a specific problem, is it always obvious what the ideal (if this is definable on objective grounds at all) statistic ought to be?
Not at all. Consider a test of general normality against an ominibus alternative, for example. There are many ways to measure deviation from normality (dozens of such tests have been proposed), and at typical sample sizes, none of them is most powerful against every alternative.
In trying to design a test for a situation like that, a certain amount of creativity is called for in coming up with a choice that will have good power against the kinds of alternatives you're most interested in picking up.
It seems those two requirements listed in step 2 above are too broad and many different statistics could be devised to test the same hypotheses.
Indeed. If you make some parametric assumption (assume the data are drawn from some distribution family and then make your hypothesis relate to one or more parameters of it) then there might be a best-possible test for all such situations (specifically, a uniformly most powerful test), but even then if your parametric assumption is more like a rough guess, then a desire for some robustness to that assumption may change things quite a bit.
For example (again, taking a one sample test of location shift to be simple), if I am sampling from a normal population then a t-test will be best. But let's say I think that it may not be exactly normal and on top of that there might be a small amount of contamination by some other process with a moderately heavy-tail, then something more robust (perhaps even a rank based alternative like the signed rank test) may tend to perform better across a variety of such situations.