0

I have a __m256i value which is a mask. I basically want 0xFF to be 1, 0xFFFF to be 2 and so on. There's no set mask once a 0 is reached so I was thinking lzcnt would solve my problem. The below seems to work but I was wondering if there's a better way

auto cnt = 
    ((64-__builtin_ia32_lzcnt_u64(mask[0]))>>3) + 
    ((64-__builtin_ia32_lzcnt_u64(mask[1]))>>3) + 
    ((64-__builtin_ia32_lzcnt_u64(mask[2]))>>3) + 
    ((64-__builtin_ia32_lzcnt_u64(mask[3]))>>3) ;
Eric Stotch
  • 4
  • 2
  • 15
  • 1
    `_mm256_movemask_epi8` to get the high bit of each byte element into one integer, then lzcnt / tzcnt, or popcnt that, whichever thing you want to know. – Peter Cordes Sep 05 '21 at 00:14
  • Maybe related: [How to count character occurrences using SIMD](https://stackoverflow.com/q/54541129), or [Searching for the key using SIMD](https://stackoverflow.com/q/67227171) / [Find the first instance of a character using simd](https://stackoverflow.com/q/40915243) for searching for the first `1` bit in a movemask result = position of first vector match. Or maybe [Using \_\_builtin\_popcount or other intrinsics to process the result of a \_mm256\_movemask\_pd compare bitmap?](https://stackoverflow.com/q/52700868) – Peter Cordes Sep 05 '21 at 00:21
  • @PeterCordes it's been a while I COMPLETELY forgot about that function. That solved it. IDK if I should delete this question since there doesn't appear to be a self close -Edit- ok that solves it. Marked as dupe – Eric Stotch Sep 05 '21 at 00:24
  • @PeterCordes: IDK if you'll see this. Apparently my next question is too short. I made a mistake and need to clear out all the bits after I see 00. What instruction would that be? Maybe I'll think of a good way to make the question longer so I can post it – Eric Stotch Sep 05 '21 at 01:39
  • Yes, notifications do work so I saw this. Probably ask a new question if you can't find a duplicate in the [tag:bit-manipulation] or [tag:avx] tags, not clear whether you're talking about the vector or the movemask result. If you mean `bzhi` after finding a bit position, or if you should load a sliding window onto an array of 0, 0,0, -1, -1, -1, ... bytes for a vector (like in [Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? Or not using that insn at all](https://stackoverflow.com/q/34306933)). Or what exactly you mean by "clear out all the – Peter Cordes Sep 05 '21 at 02:02
  • @PeterCordes: If you can't guess what I'm saying then noone can. I wrote the question. I feel like there's a more simple way to say it but I guess it isn't obvious to me https://stackoverflow.com/questions/69060150/set-mask-to-0-once-0-seen – Eric Stotch Sep 05 '21 at 02:17

0 Answers0