There are several different forms of the Wilcoxon-Mann-Whitney.
Consider this definition: $W$ = "sum of ranks in sample 2".
Under the null hypothesis, the ranks in sample 2 will be $n_2$ random draws (without replacement) from the pool of ranks, which are the numbers from 1 to $n_1+n_2$.
So we get $n_2$ values with mean $(1+n_1+n_2)/2$. Hence (by linearity of expectation) $E(W) = \frac{n_2(n_1+n_2+1)}{2}$.
So the $W$ you were given the expectation of was "the sum of ranks in sample 2".
By the same argument, the mean for the sum of ranks in sample 1 is $\frac{n_1(n_1+n_2+1)}{2}$
The mean is perhaps more easily remembered as $\frac{n_W(n+1)}{2}$, where $n_W$ is the sample size for whichever group you summed the ranks in, and $n$ is the sum of the two sample sizes.
The formula for $\sigma$ is symmetric, so it doesn't matter how the labelling goes for it.
However, with $n$'s as small as 4 and 7 I wouldn't use the asymptotic approximation, because the accuracy isn't very good in the far tails.

Good tables easily go up that far, for example*, and decent packages will compute the exact statistic.
For example, if you were doing a one-tailed test (with the alternative being that sample 1 was typically smaller) at no more than the 1% level and you got a sum of ranks in sample 1 of 11, the normal approximation would say 'reject', while the correct action would be to only reject when the sum of ranks was 10.
* If you don't have any, the probability for the end few values can be computed by hand readily enough, though, and then normal approximation should be adequate; for that sample size, the continuity correction dramatically improves the normal approximation to the cdf at W=16 and higher (it's worse for 15 and below though).