In X86 platform, I have a __m128 variable containing 4 uint32_t variables. Now I want to use SIMD to count the number of zero (or non-zero) variables in the __m128 variable quickly. Both SSE and AVX are OK. What can I do?
Asked
Active
Viewed 87 times
0
-
1Which platform? X86? Which extensions allowed? SSE version? AVX? – Sebastian Feb 08 '22 at 08:28
-
1`fast_horizontal_sum(_mm_cmpeq_epi32(var, _mm_setzero_si128()))` https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-sse-vector-sum-or-other-reduction – Aki Suihkonen Feb 08 '22 at 08:37
-
4Do you want to perform this operation once or multiple times? I.e. do you have more than one __m128 and want to count the overall sum? Then there could be additional optimizations. – Sebastian Feb 08 '22 at 08:43
-
2If you have many vectors, you can use `totals -= cmp(v, 0)` like in [How to count character occurrences using SIMD](https://stackoverflow.com/q/54541129) but without the complication of such narrow element that overflow quickly. You'll probably want to use `__m128i` with `_mm_cmpeq_epi32`for `uint32_t` data; `_mm_cmpeq_ps(__m128, __m128)` would find != 0.0 for bit-patterns that represent NaN. – Peter Cordes Feb 08 '22 at 10:01
-
3If it is indeed just one register, something like `zeros = _mm_popcnt_u32(_mm_movemask_ps(_mm_castsi128_ps(_mm_cmpeq_epi32(var, _mm_setzero_si128()))))` should work (or `nonzeros = 4 - zeros`). – chtz Feb 08 '22 at 10:45
-
1In fact, I will perform this operation twice in two different __m128, and compare their results immediately in my application scenario – dy66 Feb 08 '22 at 12:51