Why does does SSE set (_mm_set_ps) reverse the order of arguments

Question

I recently noticed that

_m128 m = _mm_set_ps(0,1,2,3);

puts the 4 floats into reverse order when cast to a float array:

(float*) p = (float*)(&m);
// p[0] == 3
// p[1] == 2
// p[2] == 1
// p[3] == 0

The same happens with a union { _m128 m; float[4] a; } also.

Why do SSE operations use this ordering? It's not a big deal but slightly confusing.

And a follow-up question:

When accessing elements in the array by index, should one access in the order 0..3 or the order 3..0 ?

Related: [Convention for displaying vector registers](https://stackoverflow.com/q/41351087). The only thing that's "reversed" is the arg order of `_mm_set` intrinsics; everything else is normal little-endian; don't access your elements backwards unless that's easier for some other reason. Also, aliasing a `float*` onto a `__m128` is not well-defined behaviour across compilers (strict-aliasing violation); see [print a \_\_m128i variable](https://stackoverflow.com/a/46752535) — Peter Cordes, Mar 29 '22 at 07:47

score 8 · Answer 1 · edited Mar 29 '22 at 07:33

8

Depend on what you would like to do, you can use either _mm_set_ps or _mm_setr_ps.

__m128 _mm_setr_ps (float z, float y, float x, float w )
Sets the four SP FP values to the four inputs in reverse order.

edited Mar 29 '22 at 07:33

phuclv

32,499
12
130
417

answered May 02 '11 at 19:42

echo

779
2
12
21

score 7 · Answer 2 · answered Mar 08 '11 at 20:37

7

Isn't that consistent with the little-endian nature of x86 hardware? The way it stores the bytes of a long long.

answered Mar 08 '11 at 20:37

Bo Persson

88,437
31
141
199

Yes, it's just normal little endian ordering. For SSE (and SIMD programming in general) the actual order of the elements doesn't matter too much in general, *except* for when you start changing the width of elements (packing/unpacking, etc) or doing anything which accesses specific elements (permutations, insert/extract, etc). – Paul R Mar 08 '11 at 21:18

score 7 · Accepted Answer · answered Mar 08 '11 at 23:20

It's just a convention; they had to pick some order, and it really doesn't matter what the order is as long as everyone follows it. Intel happens to like little-endianness.

As far as accessing by index goes... the best thing is to try to avoid doing it. Nothing kills vector performance like element-wise accesses. If you must, try set things up so that the indexing matches the hardware vector lanes; that's what most vector programmers (in my experience) will expect.

Intel doesn’t follow their own order. You can test with e.g. _mm_extract_ps() intrinsic that accepts integer index. — Soonts, May 21 '18 at 23:22

Why does does SSE set (_mm_set_ps) reverse the order of arguments

3 Answers3