SSE: convert __m128 to float

Question

I have the following piece of C code:

__m128 pSrc1 = _mm_set1_ps(4.0f);
__m128 pDest;
int i;
for (i=0;i<100;i++) {
       m1 = _mm_mul_ps(pSrc1, pSrc1);      
       m2 = _mm_mul_ps(pSrc1, pSrc1);        
       m3 = _mm_add_ps(m1, m2);             
       pDest = _mm_add_ps(m3, m3); 
}

float *arrq = (float*) pDest;

Everything until the end of the for loop works. What I am trying to do now is to cast the __m128 type back to float. Since it stores 4 floats I thought I easily can cast it back to float*. What am I doing wrong? (This is a test code, so don't wonder). I basically tried all possible conversions I could think of. Thx for your help.

Anteru · Accepted Answer · 2013-01-17T05:38:39.740

11

You'll need to use _mm_store_ps to get it back into a float. Code:

// result must be 16-byte aligned
float result [4];
_mm_store_ps (result, pDest);

// If result is not 16-byte aligned, use _mm_storeu_ps
// On modern CPUs this is just as fast as _mm_store_ps if
// result is 16-byte aligned, but works in all other cases as well
_mm_storeu_ps (result, pDest);

edited Jan 17 '13 at 05:38

answered Jan 16 '13 at 20:51

Anteru

18,671
11
76
120

Thanks alot. That was quite easy. I am now to the field, so sorry for the stupid question – Jan 16 '13 at 20:53
4

[Watch out with stack variables though, `result` should be 16-byte aligned.](http://stackoverflow.com/questions/841433/gcc-attribute-alignedx-explanation) – user7116 Jan 16 '13 at 20:57

Aaron D. Marasco · Answer 2 · 2013-10-02T11:28:40.460

3

I believe casting works if you cast properly. I don't have the code in front of me, but I'm pretty sure this worked for me:

float *arrq = reinterpret_cast<float*>(&pDest);

Note that it uses a C++ cast describing what you are doing, and it is converting the address of it into a pointer.

edited Oct 02 '13 at 11:28

answered Oct 02 '13 at 11:10

Aaron D. Marasco

6,093
3
23
38

This is indeed the way to go if you want to avoid needless copying. Also many C++ coders should learn to use C++ casting. Though it's cumbersome to write (well, not really with a good editor and completion), it improves readability. – St0fF Aug 25 '16 at 10:17

SSE: convert __m128 to float

2 Answers2