5

I would like to translate a simple x86_64 machine code into LLVM IR, which can be later analyzed. For my particular use case, I need to be able to work with just instructions and opcodes directly, and I don't have access to the binary itself.

To my current understanding, I should be able to convert x86 instructions using tools such as rellume and remill. With their help, I am able to create LLVM IR code, however, I am not entirely sure whether the results I am getting are correct.

First I need to create machine code for a very simple application (this is just for testing purposes):

  • Compile the source code [1]

    • gcc simple.c -o simple.o
  • Dissemble using objdump [2]

    • objdump -d simple.o
    • At this point, I get separate functions add and main

Then, I provide a function that I want to translate into LLVM IR to remill as bytes:

  • Translate add function into LLVM using remill
    • bytes = add function as bytes
    • the result should be a LLVM IR of the add function
docker run --rm \
 -it remill \
 --arch amd64 \
 --ir_out /dev/stdout \
 --bytes f30f1efa554889e54883ec10be03000000bf01000000e8cdffffff8945fc8b45fcc9c3

My questions:

  • Is my current workflow to translate x86 instructions into LLVM IR correct? Am I missing something? (I am aware of tools such as McSema, however, for my use case I need to be able to transform opcodes).
  • How can I verify the produced LLVM IR?
    • After producing LLVM IR of an even simpler example [3], I tried to run it with lli unsuccessfully.

  1. Source code

int add(int a, int b){
    return a + b;
}

int main() { int c = add(1, 3); return c; }

  1. Dump of objdump
...

0000000000001129 <add>: 1129: f3 0f 1e fa endbr64 112d: 55 push %rbp 112e: 48 89 e5 mov %rsp,%rbp 1131: 89 7d fc mov %edi,-0x4(%rbp) 1134: 89 75 f8 mov %esi,-0x8(%rbp) 1137: 8b 55 fc mov -0x4(%rbp),%edx 113a: 8b 45 f8 mov -0x8(%rbp),%eax 113d: 01 d0 add %edx,%eax 113f: 5d pop %rbp 1140: c3 ret

0000000000001141 <main>: 1141: f3 0f 1e fa endbr64 1145: 55 push %rbp 1146: 48 89 e5 mov %rsp,%rbp 1149: 48 83 ec 10 sub $0x10,%rsp 114d: be 03 00 00 00 mov $0x3,%esi 1152: bf 01 00 00 00 mov $0x1,%edi 1157: e8 cd ff ff ff call 1129 <add> 115c: 89 45 fc mov %eax,-0x4(%rbp) 115f: 8b 45 fc mov -0x4(%rbp),%eax 1162: c9 leave
1163: c3 ret

...

int main
{
  int val = 2
  return val;
}
jgawr
  • 51
  • 2

0 Answers0