(via Mann Withney, or Kruskal Wallis),
While the test statistic in Mann-Whitney does correspond to a measure of $P(X>Y)$, once you go beyond two samples Kruskal-Wallis doesn't quite do that in general. Three pairwise Mann-Whitney tests are sensitive to (/can detect) non-transitive dominance (where $X>Y$, $Y>Z$, $Z>X$; "pairwise dominance" in this sense is a non-transitive relation) but Kruskal-Wallis does not detect that cyclical relationship.
A and B also would not be normal, but the t-test is valid per CLT (say sample size 30 or 40)
$n=40$ gives no guarantee that the test statistic has very close to a t-distribution; and certainly nothing in the central limit theorem suggests that will be the case at that specific sample size. This is not especially important, as we can test for difference in population means without needing a t-statistic to have a t-distribution. Besides tests based on other parametric assumptions (i.e. other than the one of normality of the parent distributions), there's also nonparametric tests of means.
For the moment, however let's agree to take as given that the situation is such that the t-statistic is approximately distributed as t with the usual degrees of freedom even though it's not always going to be the case.
I was wondering if it is possible to have 2 samples (A and B), where A is stochastically dominant over B [...] but where conversely B has a significantly higher mean than A (via 2-sample Welch test).
Yes. You can have populations where $P(X>Y)>0.5$ and yet with $\mu_Y>\mu_X$. With large enough samples you can have high power to detect both effects.
It's not just means and $P(X>Y)$ that this can happen with; you can do this with almost any pair of non-equivalent tests. You could, for example have the Mann-Whitney go in the opposite direction to a difference in medians if you wanted.
Does such a counter-example exist,
Do you mean as data sets? Counterexamples to the idea they should go in the same direction are easy to construct.
Finding very small samples for which the mean difference is in one direction and the Mann-Whitney "difference" is in the opposite direction isn't difficult; of course with small samples the p-values won't be small.
Attaining significance is then just a matter of then accumulating the same sorts of differences enough times to get standard errors of those differences to be small.
Here's one I just constructed in R:
# 1. create a pair of small samples where the Mann-Whitney and
# t-test differences go in opposite directions
x <- c(-39, -41, 9, 10, 11, 50) # mean is 0
y <- -x # here the means are equal but x slightly dominates y (20/36)
x <- x-6 # here we move x so the means differ in the opposite direction
# without shifting enough to change the ordering
2. create many copies of each sample (pushing p-values down)
... but with a very small amount of noise added (to remove ties)
rp <- 40
x1 <- rep(x,each=rp)+rnorm(rplength(x),0,.02)
y1 <- rep(y,each=rp)+rnorm(rplength(y),0,.02)
The large x-sample still dominates the y-sample 55% of the time (same proportion as in the small samples), and the large x-sample still has mean about 6 smaller than the y-sample (same as in the small samples).
Except now these samples are big enough that each of the differences is significant at the 5% level (both p-values are between 0.03 and 0.04). I used the ordinary equal-variance t-test for the exercise*, rather than the Welch t test -- not that it matters in this case; the sample variances are effectively identical, so the two tests are the same. Of course you can use the same strategy to attain even smaller p-values.
Example 2
X sample (sorted, n=27):
28.3, 28.4, 28.9, 29.6, 30.2, 30.4, 30.6, 31.1, 31.3, 107.8,
107.9, 108, 108.3, 108.4, 108.5, 108.8, 109.1, 109.2, 109.4,
109.5, 110, 110.2, 110.7, 110.8, 110.9, 111.1, 111.8
Y sample (sorted,n=22):
88.4, 88.9, 89.1, 90.9, 97.9, 98.5, 98.9, 99.2, 99.3,
99.5, 100.1, 100.6, 100.7, 100.9, 101.8, 102.4, 102.8, 102.9,
103.4, 104.2, 105.4, 107.7
Fully 2/3 of the pairs of $(X_i,Y_j)$ values have the X's exceed the Y's but the mean of the X's is well below the mean of the Y's. Both the Welch t-test and the Mann-Whitney test are significant at the 5% level for two tailed tests, but they're detecting in opposite directions.
> t.test(x,y)
Welch Two Sample t-test
data: x and y
t = -2.1901, df = 27.243, p-value = 0.03725
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-31.597020 -1.036313
sample estimates:
mean of x mean of y
82.93333 99.25000
> wilcox.test(x,y,conf.int=TRUE)
Wilcoxon rank sum exact test
data: x and y
W = 396, p-value = 0.04704
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.2 9.1
sample estimates:
difference in location
6.7
Here are confidence intervals for each test; the Mann-Whitney (MW, in blue) has A "higher" than B and the Welch t test (tW, in red) has A "lower" than B:

* the specific data and the statistics and p-values are not shown for the first example. It has 240 observations per sample, so I don't find it interesting in itself, the important part is understanding the construction method. With it you can make as many examples as you like. I was also worried people would focus too much on the specific appearance of this lone case and - either consciously or unconsciously - generalize features of that single example (i.e. assume that all cases would be of this form), as if its specific features characterized the situation. For example, I had some concern that they might conclude that the opposite directions of skewness in the first example was a necessary condition. The second smaller example has quite different features, which helps with that concern.