4

I want to build the control-flow-graph (CFG) from the bytecode of a smart-contract (assuming it is obtained by compiling a Solidity source file). This CFG should also distinguish the different methods of the smart contract.

Is there a way to do so?

Briomkez
  • 1,894
  • 1
  • 11
  • 33

2 Answers2

6

How to separate functions in evm bytecode?

Solidity will create a dispatcher block for function calls at the beginning of the bytecode. Similar to an if .. elseif .. elseif .. else

Single function calls will follow the following repeating pattern:

DUP1
PUSH4 <4-byte function signature>
EQ
PUSH2 <jumpdestination for the function>
JUMPI

From this block you can reconstruct the functions and find their jump destination, however, you will only have the 4-byte signatures and no names from the source code.

For instance for this smart contract code:

contract X {
    uint x;
    uint y;

    function a(uint u) public {
        x = u;
    }

    function b(uint v) public {
        y = v;
    }

    function t(uint v) public {
        a(v);
        b(v);
    }

    function () {
        t(1);
    }
}

The dispatcher will look like this:

...
054 DUP1
055 PUSH4 afe29f71
060 EQ
061 PUSH2 0070
064 JUMPI

065 DUP1
066 PUSH4 cd580ff3
071 EQ
072 PUSH2 009d
075 JUMPI

076 DUP1
077 PUSH4 f0fdf834
082 EQ
083 PUSH2 00ca
086 JUMPI
...
ivicaa
  • 7,519
  • 1
  • 20
  • 50
2

If it is possible to write smart contracts in bytecode directly, then it is possible to read and analyze it. Taking in account that bytecode is similar to assembler with LIFO stack, it not friendly to read. There in no functions' name, variables' name, names are absent at all. Here is similar question about assembler https://reverseengineering.stackexchange.com/questions/10604/how-to-generate-cfg-from-assembly-instructions.

In order to do this I would:

  • learn opcodes http://gavwood.com/paper.pdf

  • check Remix, there is awesome bytecode overview after compilation (can be useful)

  • check project https://github.com/comaeio/porosity to understand how to parse it

    for instance:

    • define locations of opcodes
    • detect possible tags by JUMPDEST
    • detect jumps by PUSH2 0x.... JUMP (can assume as function call)

Beware the result can be different after compiling of same code by different versions of compiler, means control-flow can be different.

Porocity CFG

porosity.exe --code 0x60... --cfg

enter image description here

Aquila
  • 1,812
  • 1
  • 10
  • 23
  • Yes, I know that, but I am searching something similar to Soot for the EVM. Soot is able to rebuild from the pure JVM bytecode the fields and the methods. So it should be possible. I will take a look to porosity (although it is not more supported :/). – Briomkez Oct 05 '18 at 13:39
  • 1
    @Briomkez If you need decompiling the question can be useful for you https://ethereum.stackexchange.com/questions/188/how-can-you-decompile-a-smart-contract – Aquila Oct 05 '18 at 13:43
  • Thanks, yes, I am not searching to rebuild the exact source code, but only to have a representation similar to Soot's Jimple representation: with fields and methods separated – Briomkez Oct 05 '18 at 13:47
  • @Briomkez pls check my update, hope it helps – Aquila Oct 06 '18 at 16:00