0

I'm currently working on an Obfuscator for assembled x86 assembly (working with the raw bytes).

To do that I first need to build a simple parser, to "understand" the bytes. I'm using a database that I create for myself mostly with the website: https://defuse.ca/online-x86-assembler.htm

Now my question: Some bytes can be interpreted in two ways, for example (intel syntax):

1. f3 00 00                repz add BYTE PTR [eax],al
2. f3                      repz

My idea way to loop through the bytes and work with every instruction as single, but when I reach byte '0xf3' I have 2 ways of interpreting it.

I know there are working x86 disassemblers out there, how do I know what case this is?

Nate Eldredge
  • 36,841
  • 4
  • 40
  • 60
  • 2
    Both ways are invalid instructions, so I'm not sure why it matters. A `rep` prefix has to be followed by one of the specific instructions for which it's defined, and `add` isn't one of them. – Nate Eldredge Sep 06 '21 at 18:21
  • Related: [How does an instruction decoder tell the difference between a prefix and a primary opcode?](https://stackoverflow.com/q/68898858) – Peter Cordes Sep 06 '21 at 19:09
  • Also related: "mandatory prefixes" as part of encoding instructions like SSE2 `movdqa`: [Combining prefixes in SSE](https://stackoverflow.com/q/2404364) – Peter Cordes Sep 06 '21 at 19:13

1 Answers1

4

Prefixes, including repz prefix, are not meaningful without subsequent instruction. The subsequent instruction may incorporate the prefix (repz nop is pause), change its meaning (repz is xrelease if used before some interlocked instruction), or the prefix may be just invalid.

The decoding is always unambiguous, otherwise the CPU could not execute instructions. It may be ambiguous only if you don't know exact byte offset where to begin decoding (as x86 uses variable instruction length).

Alex Guteniev
  • 10,518
  • 2
  • 31
  • 66
  • 1
    *decoding is always unambiguous* - or at least, any given CPU will pick one way of decoding. Intel's manual says it's "illegal" to have multiple REX prefixes on one instruction, but their Skylake CPUs for example will take the last one ([like with other repeated prefixes](https://stackoverflow.com/questions/43433030/how-did-pentium-iii-cpus-handle-multiple-instruction-prefixes-from-the-same-grou/44366568#44366568)), not #UD fault. There is AFAIK no Intel documentation that says this is what will happen. But yes, they're still REX prefixes, so unambiguous in that sense. – Peter Cordes Sep 06 '21 at 19:08
  • 1
    Finally found the Q&A where I'd tested repeated REX prefixes: [Segmentation fault when using DB (define byte) inside a function](https://stackoverflow.com/a/55642776) – Peter Cordes Sep 06 '21 at 19:28