4

I was doing research on EVM and solidity and I came across this fact that the calldata/input data is created using RLP encoding and stuff. I know the process and I don't want to elaborate on that. My question is, what is the logic or maths behind choosing the first 4 bytes of data for identifying the method? Why not 5? Why not some other number? For example if the call data is: 0xee919d50000000000000000000000000000000000000000000000000000000000000001 Then why do we take only the first 4 bytes, i.e., ee919d50 as the method id and not more or fewer bytes?

I also read from the first answer to this question: How does the EVM find the entry of a called function? that said that if you want to implement your own logic, you can consider the first 8 bytes of data instead of first four bytes. I am interested in knowing the actual reason for selecting the number "4".

I would appreciate if someone could explain or point a resource that has a detailed explanation of this question.

Ashish Mishra
  • 313
  • 1
  • 8

1 Answers1

4

As any engineering choice, it is a trade off.

This way you can address in theory 4.294.967.296 different methods in any single contract.

It is a reasonable choice because, even giving the (crazy big) possibility to have a collision in 99999 out 100000 cases, you are assured in any case of a lot of unique entries for methods in the single contract!

It seems enough for any possible contract, giving furthermore the presence of a code size limit of 24 kbytes or so (who ever saw a smart contract having 500 or 1000 methods? The major of them have five-to-twenty methods and that’s all).

On the other hand, using a maximum of 32 bit (I.e. 4 bytes) is a reasonable choice in order to efficiently address the hash table with the most of the cpu’s today presumably used to run EVM nodes.

Let’s say that 4 bytes (I.e. 32 bit) is the maximum possible number of entries easy to address using the majority of current CPUs.

Rick Park
  • 3,194
  • 2
  • 8
  • 25