10

For code related to this question, I need to compute the following as fast as possible:

Given a 32 bit integer i, compute the position of the n-th least significant set bit. Both n and the result should be 0-indexed.

For example, given the number i = 110101101012 and n = 4, the desired number is 7 as the fourth set bit is at position 7: 11010110101.

Using the pdep instruction from the BMI2 instruction set extension for x86 and the commonly available __builtin_ctz() intrinsic function, this can be computed easily:

j = _pdep_u32(1 << n, i);
return (__builtin_ctz(j));

However, many computers do not have the pdep instruction (and it's slow on AMD CPUs that do have it), rendering this approach non-portable. You can also compute such bit positions without pdep like this:

j = i;
for (k = 0; k < n; k++)
    j &= j - 1;

return (__builtin_ctz(j));

However, this is pretty slow.

I am targeting computers that provide at least __builtin_popcount() and __builtin_ctz(). How can I find such bit positions faster?

Peter Cordes
  • 286,368
  • 41
  • 520
  • 731
fuz
  • 82,933
  • 24
  • 182
  • 332

0 Answers0