As @harold says, storing to memory is already covered by MMX movd, or pshufw+movd to extract just the high float.
The one thing you can't do is turn an 3dNow! float into an x87 80-bit float without a store/reload.
What might have been potentially useful is a version of EMMS that expands a 32-bit float into an 80-bit x87 long double in st0, along with setting the FPU back into x87 mode instead of MMX mode1. Or maybe even do that for multiple mm registers into multiple x87 registers?
i.e. it would be a shortcut for movd dword [esp], mm0 / emms / fld dword [esp] to set up for further scalar FP after a SIMD reduction.
Remember that these are IEEE754 floats; you normally don't want them in integer registers unless you're picking apart their bit-fields (e.g. for an exp or log implementation), but you can do that with MMX shift/mask instructions.
But movd and fld are cheap, so they didn't bother making a special instruction just to save the reload latency. Also, it might have been slow to implement as a single instruction. Even though x86 is not a RISC ISA, having one really complex instruction is often slower than multiple simpler instructions (especially before decoding to multiple uops was fully a thing.) e.g. Intel and AMD's sysenter and syscall instructions to replace int 0x80 for system calls require additional instructions before/after to save more state, but are still overall faster.
3dNow!'s femms leaves the MMX/3dNow! register contents undefined, only setting the tag words to unused instead of preserving the mapping from MMX registers to/from x87 register contents. See http://refspecs.linuxbase.org/AMD-3Dnow.pdf for an official AMD manual. IDK if AMD's microarchitectures just dropped the register-renaming info or what, but probably making store / femms / x87-load the fast way saves a lot of transistors.
Or even FEMMS is still somewhat slow, so they don't want to encourage coders to leave/re-enter MMX/3dNow! mode at all often.
Fun fact: 3dNow! PREFETCHW (prefetch with write intent) is still used, and has its own CPUID feature bit.
See my answer on What is the effect of second argument in _builtin_prefetch()?
Intel CPUs soon added support for decoding it as a NOP (so software like 64-bit Windows can use it without checking), but Broadwell and later actually prefetch with a RFO to get the cache line in MESI Exclusive state, rather than Shared, so it can flip to Modified without additional off-core traffic.
The CPUID feature bit indicates that it really will prefetch.
Footnote 1:
Remember that the MMX registers alias the x87 registers, so no new OS support was needed to save/restore architectural state on context switches. It wasn't until SSE that we got new architectural state. So it wasn't until SSE2+3dNow! that a 3dNow! float to SSE2 double could make sense without switching back to x87 mode. And you could movq2dq xmm0, mm0 + cvtps2pd xmm0, xmm0.
They could have had a float->double in a mm register, but the fld / fst hardware was only designed for float or double->80-bit and 80-bit->float or double. And the use-case for that is limited; if you're using 3dNow!, just stick to float.