input: uint32 random_val; unsigned total_bits_set = 0; unsigned i = 32; do
total_bits_set += (random_val >> --i) & 1; while ( --i ); output: total_bits_set
Now, just unroll so total number of loops is below 16 and you get perfect branch prediction on Pentium 4. Or, take a code size hit and unroll all 32 loops. OK, so how many bonus points do I get?:-)
whoops, I meant: ...
while ( i );
No problem:
:-)
input: uint32 random_val;
unsigned total_bits_set = 0;
unsigned i = 32;
do
total_bits_set += (random_val >> --i) & 1;
while ( --i );
output: total_bits_set
Now, just unroll so total number of loops is below 16 and you get perfect branch prediction on Pentium 4. Or, take a code size hit and unroll all 32 loops. OK, so how many bonus points do I get?