2

I have a code.c file with object code file code.o is a multiple bytes sequence having hexadecimal representation:

55 89 e5 8b 45 0c 03 45 08 01 05 00 00 00 00 5d c3

How does CPU group the byte to execute? for example, we know in advance that 55 corresponds to push , 89 and e5 correspond to mov, but some instructions consist of three bytes or more, so how CPU know that '55 89 e5' is two instructions rather than one instruction?

Martin James
  • 23,993
  • 3
  • 34
  • 58
amjad
  • 3,048
  • 1
  • 11
  • 42
  • Many modern processors (especially the x86 family) doesn't use the instruction-set you see, it translates it into an even more basic and simpler instruction set that is then executed. The exact implementation of this translation is to much to go into, but it could use a table and very quick comparisons (perhaps only check one or a few bits). – Some programmer dude Aug 22 '18 at 08:51
  • 3
    why you not ask how cpu know that `55` is `push rbp` or `push ebp` (depend from 32 or 64 bit mode) ? cpu must know how decode and interpret instruction bytes. and determinate len of instruction as part of this process – RbMm Aug 22 '18 at 08:52
  • Typically, with 'escape sequences' of bits in the first byte or two. Those billions of transistors in the CPU are there for a reason, as are the complex manner in which they are conected together:) – Martin James Aug 22 '18 at 08:54
  • Note that this is a hardware question and surely has no connection with C, (removed the C tag). – Martin James Aug 22 '18 at 08:56
  • Using the same process that a software disassembler does, but in hardware. Given a starting point, (logically) decode one at a time to find the end of that instruction, then decode the next from there. – Peter Cordes Aug 22 '18 at 10:42
  • 1
    Does this answer your question? [How does the CPU know how many bytes it should read for the next instruction, considering instructions have different lenghts?](https://stackoverflow.com/questions/56385995/how-does-the-cpu-know-how-many-bytes-it-should-read-for-the-next-instruction-co) – phuclv Jan 31 '20 at 03:21
  • other duplicates: [With variable length instructions how does the computer know the length of the instruction being fetched?](https://stackoverflow.com/q/24269368/995714), [How does the CPU/assembler know the size of the next instruction?](https://stackoverflow.com/q/25101978/995714) – phuclv Jan 31 '20 at 03:22

2 Answers2

3

Part of the CPU is an instruction decoder (see e.g. the Wikipedia article on Central Processing Unit). The task of the instruction decoder is to determine how many bytes, starting from the address pointed to by the Instruction Pointer, are part of the current instruction, and decode it into its constituent parts.

There are some architectures (mostly microcontrollers nowadays) where all instructions are the same size. On 64-bit Intel/AMD architecture (x86-64 a.k.a. AMD64), the instruction size varies between 1 and 15 bytes, and the instruction encoding is quite complex.

Nominal Animal
  • 36,313
  • 4
  • 53
  • 83
2

On x86 specifically, the instruction encoding is such that from each byte, the decoder can learn how many more bytes follow.


For example, let me show you how the decoder could possibly decode this instruction stream.

55

the decoder sees 55 and knows that this is push ebp, a single byte instruction. So it decodes push ebp and proceeds to the next instruction.

push ebp
89

the decoder sees 89 which is mov r/m32,r32. This instruction is followed by a modr/m byte specifying the operands.

push ebp
89 e5

the modr/m byte is e5 indicating ebp as the r/m operand and esp as the r operand, so the instruction is mov ebp, esp.

push ebp
mov ebp, esp
8b

this instruction is mov r32,r/m32 which is likewise followed by a modr/m byte.

push ebp
mov ebp, esp
8b 45

this modr/m byte has an r operand of eax and a r/m32 operand of [ebp + disp8] with an 8 bit displacement, which comes with the next byte

push ebp
mov ebp, esp
8b 45 0c

the displacement is 0c so the instruction is mov eax, [ebp + 0xc]

push ebp
mov ebp, esp
mov eax, [ebp + 0xc]
03

this instruction is add r,r/m32 again followed by a modr/m byte.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
03 45

same as before, the r operand is eax while the r/m operand is [ebp + disp8]. The displacement is 08.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
01

this instruction is add r/m32, r followed by a modr/m byte.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
01 05

this modr/m byte indicates an r operand of eax and an r/m operand of [disp32]. The displacement follows in the next four bytes which are 00 00 00 00.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
5d

instruction 5d is pop ebp, a single byte instruction.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
pop ebp
c3

instruction c3 is ret, a single byte instruction. This instruction transfers control to somewhere else, so the decoder stops decoding from here.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
pop ebp
ret

In real x86 processors, complicated parallel decoding techniques are employed. This is possible because the processor may cheat and pre-read instruction bytes that may or may not be part of any instruction.

fuz
  • 82,933
  • 24
  • 182
  • 332