The _mm_load_ps() SSE intrinsic is defined as aligned, throwing exception if the address is not aligned. However, it seems visual studio generates unaligned read instead.
Since not all compilers are made the same, this hides bugs. It would be nice to be able to be able to turn the actual aligned operations on, even though the performance hit that used to be there doesn't seem to be there anymore.
In other words, writing code:
__m128 p1 = _mm_load_ps(data);
currently produces:
movups xmm0,xmmword ptr [eax]
expected result:
movaps xmm0,xmmword ptr [eax]
(I was asked by microsoft to ask here)