the experiment is on 32-bit x86 Linux.
I am doing some static binary instrumentation work, and basically I am trying to insert some instructions below to the beginning of every basic block.
BB23 : push %eax
movl index,%eax
movl $0x80823d0,buf(,%eax,0x4)
add $0x1,%eax
cmp $0x400000,%eax
jle BB_23_stub
movl $0x0,%eax
BB_23_stub:movl %eax,index
pop %eax
Note that I need to use cmp instruction, and in order to guarantee that flags can restore to the original value, I use pushf and popf to store\load flags on the stack.
Then it becomes this:
BB_23 : push %eax
pushf
movl index,%eax
movl $0x17,buf(,%eax,0x4)
add $0x1,%eax
cmp $0x400000,%eax
jle BB_23_stub
movl $0x0,%eax
BB_23_stub:movl %eax,index
popf
pop %eax
I tested the performance with and without pushf and popf (I am using gzip and bzip). And to my surprise, performance penalty could increase even 3 times after using the pushf and popf instructions!!
However, without pushf and popf. The compression results of gzip and bzip are incorrect.
So here is my question:
Why pushf and popf so slow? Am I using it in a correct way?
I cannot afford too much performance penalty introduced by pushf and popf. Is there any way I can avoid the high overhead and also keep the correct semantics? (protecting the value in flags, basically..)
Am I clear enough? Could anyone give me some help?
pushfmight wreck havoc on the instruction pipeline, since it needs all flags valid, while most other instructions don't care about the flags. In the same way, the pipeline might get delayed by popf if an instruction that needs a flag follows. 2) I'd replace youradd-cmp-jle-movcombo withinc %eax,and $0x3fffff, %eaxwhich should speed up the code a bit since it avoids a branch. This won't help you with flags, however, I don't see a way to do this without touching flags.inc %eaxwithlea eax, [eax+1](sorry, Intel Syntax, I don't really like AT&T syntax and don't know how to translate it right now) will avoid changing the flags likeincdoes. Now if i could just figure out how to do theandwithout changing flags and you could get rid of those peskypushfandpopfinstructions ... – Guntram Blohm Jul 14 '15 at 18:540x400000and wrapping around there. If you can afford to do it the other way round, you could misuse theloopinstruction which doesn't change flags. Initialize your index to0x400000, useecxinstead ofeax, and to decrement and re-init on zero, useloop forward,mov $0x400000, %ecx,forward: movl %ecx, index. Consider theloopadecrement and jump if not zero. – Guntram Blohm Jul 14 '15 at 19:16