7

Why can programs written in C# be reverse-compiled essentially to their original form with variables names (such as dnSpy) while C++ decompilers (such as Ghidra) are unable to decode the variable names?

Polydynamical
  • 173
  • 1
  • 7

1 Answers1

11

Debug Symbol information is often "stripped off" from C++ binaries. Symbol information stores all user-created names, symbols, and types, bounds, fouction boundary and other function related metadata information (it is generally stored according to a popular and standardized "dwarf" format which is widely used and employed in modern compilers). If you want to keep this information then compile your binary with - say -g flag in gcc or clang. For e.g. gcc -g myprog.c. You will find all user-defined symbols rendered by Ghidra.

On the other hand, in C# .NET removal of name symbol metadata is not possible (as reflection requires retrieval of symbol for types at runtime). Thus to work around this, C# symbols are generally obfuscated.

References: https://www.appsealing.com/code-obfuscation-comprehensive-guide/ http://www.semdesigns.com/Products/Obfuscators/CSharpObfuscationExample.html https://help.gapotchenko.com/eazfuscator.net/53/advanced-features/symbol-names-encryption

R4444
  • 1,807
  • 10
  • 30
  • 2
    To be more straight: .net assemblies contain roughly the same information that a normal object file has when it was compiled to include all debug information. (Which, in turn, means that regular C/C++ linkers could be much smarter -- about as smart as the .net runtime -- if presented with binaries that contain debug information. But they aren't.) – Peter - Reinstate Monica Apr 06 '21 at 11:05
  • 3
    I feel this answer is missing the point. In C++, variables (fields, globals, locals, statics and so on) are identified by their address. In C#, like in Java, fields are identified by their fully qualified name. In fact, you can create an assembly without having its dependencies at hand, but you can only compile (but not link) a C++ program without its dependencies. That's why names cannot be removed (but can be renamed) and why local vars in a methods are not reversed to their original names, – Margaret Bloom Apr 06 '21 at 13:41
  • 4
    On Windows, "stripping off" is usually not necessary as the relevant C++ debug information is often stored in an extra file (Program DataBase, .PDB). This file is not shipped to the customers. – MSalters Apr 06 '21 at 14:07
  • 1
    A quick comment re: "C# symbols are generally obfuscated.". I've been working with C# since way before it shipped (and I was working for Microsoft, doing high end dev support for the first 10 years of .NET). Other than demos, I've never seen anyone obfuscate their code (I'm not saying no one does, but I dispute generally). – Flydog57 Apr 07 '21 at 00:37
  • More comments. I'm not sure what @MargaretBloom means by "In C#, ..., fields are identified by their fully qualified name". The IL code uses addresses (and slot numbers) internally. The symbolic information is part of the metadata that is stored within the assembly, separate from the code. The .NET compilers also produce PDB files that are used by debuggers like Visual Studio (augmenting the symbol info in the metadata). For what it's worth, keeping PDBs around can be a best practice (in C# and particularly C++). It allows for a more usable post-mortem debug experience – Flydog57 Apr 07 '21 at 00:43
  • @Flydog57 The IL code refers Metadata tokens/RID, which in turn identify metadata rows (which contain metadata information). In C# code is organized in assemblies, classes, and methods. These concepts are part of how the IL was designed and cannot be stripped off (only renamed if not used by external assemblies). The x86 assembly has no concept of class or variable in general. I've disassembled thousands of .NET assembly and native PE, if you do it, you'll get what I mean. – Margaret Bloom Apr 07 '21 at 08:19