3

What is the best way to check whether a AVX intrinsic __m256 (vector of 8 float) contains any inf? I tried

__m256 X=_mm256_set1_ps(1.0f/0.0f);
_mm256_cmp_ps(X,X,_CMP_EQ_OQ);

but this compares to true. Note that this method will find nan (which compare to false). So one way is to check for X!=nan && 0*X==nan:

__m256 Y=_mm256_mul_ps(X,_mm256_setzero_ps());   // 0*X=nan if X=inf
_mm256_andnot_ps(_mm256_cmp_ps(Y,Y,_CMP_EQ_OQ),
                 _mm256_cmp_ps(X,X,_CMP_EQ_OQ));

However, this appears somewhat lengthy. Is there a faster way?

Walter
  • 42,785
  • 19
  • 106
  • 187

3 Answers3

7

If you want to check if a vector has any infinities:

#include <limits>

bool has_infinity(__m256 x){
    const __m256 SIGN_MASK = _mm256_set1_ps(-0.0);
    const __m256 INF = _mm256_set1_ps(std::numeric_limits<float>::infinity());

    x = _mm256_andnot_ps(SIGN_MASK, x);
    x = _mm256_cmp_ps(x, INF, _CMP_EQ_OQ);
    return _mm256_movemask_ps(x) != 0;
}

If you want a vector mask of the values that are infinity:

#include <limits>

__m256 is_infinity(__m256 x){
    const __m256 SIGN_MASK = _mm256_set1_ps(-0.0);
    const __m256 INF = _mm256_set1_ps(std::numeric_limits<float>::infinity());

    x = _mm256_andnot_ps(SIGN_MASK, x);
    x = _mm256_cmp_ps(x, INF, _CMP_EQ_OQ);
    return x;
}
Mysticial
  • 452,826
  • 45
  • 327
  • 325
  • Your `has_infinity` function seems like the old way of doing things before `ptest`. `vmovmskps` also requires `test` whereas `vptest` sets the `RFLAGS` register and does not need `test`. – Z boson Jul 03 '15 at 07:40
  • Based on Peter Comment in my answer it seems `ptest` is not a big advantage to `movmsk` anyway. – Z boson Jul 06 '15 at 09:04
  • This [thread](https://software.intel.com/en-us/forums/intel-isa-extensions/topic/293050) from this [answer](http://stackoverflow.com/a/32637106/2542702) may interest you. – Z boson Sep 23 '15 at 10:05
2

I think a better solution is to use vptest rather than vmovmskps.

bool has_infinity(const __m256 &x) {
    __m256 s   = _mm256_andnot_ps(_mm256_set1_ps(-0.0), x);
    __m256 cmp = _mm256_cmp_ps(s,_mm256_set1_ps(1.0f/0.0f),0);
    __m256i cmpi = _mm256_castps_si256(cmp);
    return !_mm256_testz_si256(cmpi,cmpi);
}

The intrinsic _mm256_castps_si256 is only to make the compiler happy "This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency."

vptest is superior to vmovmskps because it sets the zero flag while vmovmskps does not. With vmovmskps the compiler has to generate test to set the zero flag.

Z boson
  • 31,096
  • 11
  • 111
  • 209
  • `movmsk` vs. `ptest` is not as obvious as you'd think. `test` can fuse with a `jcc`, and `ptest` is 2 uops. I think `ptest` is lower latency, though. – Peter Cordes Jul 03 '15 at 15:27
  • @PeterCordes, good points! I guess that's why Paul R found my solution using `ptest` to be only a bit faster than `movmsk` rather than a lot faster. – Z boson Jul 06 '15 at 09:02
1

I had an idea, but it ends up only helping if you want to test for ALL elements being infinite. Oops.

With AVX2, you can test for all elements being infinity with PTEST. I got this idea for using xor to compare for equality from EOF's comment on this question, which I used for my answer there. I thought I was going to be able to make a shorter version of a test-for-any-inf, but of course pxor only works as a test for all 256b being equal.

#include <limits>

bool all_infinity(__m256 x){
    const __m256i SIGN_MASK = _mm256_set1_epi32(0x7FFFFFFF);  // -0.0f inverted
    const __m256 INF = _mm256_set1_ps(std::numeric_limits<float>::infinity());

    x = _mm256_xor_si256(x, INF);  // other than sign bit, x will be all-zero only if all the bits match.
    return _mm256_testz_si256(x, SIGN_MASK); // flags are ready to branch on directly
}

With AVX512, there's a __mmask8 _mm512_fpclass_pd_mask (__m512d a, int imm8). (vfpclasspd). (See Intel's guide). Its output is a mask register, and I haven't looked into testing/branching on a value there. But you can test for any/all of +/- zero, +/- inf, Q/S NaN, Denormal, Negative.

Community
  • 1
  • 1
Peter Cordes
  • 286,368
  • 41
  • 520
  • 731
  • I did something similar here [sse-testing-equality-between-two-m128i-variables](http://stackoverflow.com/questions/26880863/sse-testing-equality-between-two-m128i-variables/26883316#26883316). – Z boson Jul 03 '15 at 07:41
  • I think you can come up with a `has_infinity` using `vptest`. – Z boson Jul 03 '15 at 07:44
  • By itself? I don't think so, because all you could do is test if any of the bit-pattern that means `inf` is present. What you need is to test that ALL the bits are set. I think you need something like `pcmpeq` or `cmpps` to turn a matches-exact-bit-pattern into an all-set or none-set, if you want your condition to be true when one element matches your pattern, but other elements can be anything. – Peter Cordes Jul 03 '15 at 15:22