19

I'm experimenting with ELF executables and the gnu toolchain on Linux x86_64:

I've linked and stripped (by hand) a "Hello World" test.s:

        .global _start
        .text
_start:
        mov     $1, %rax
        ...

into a 267 byte ELF64 executable...

0000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
0000010: 0200 3e00 0100 0000 d400 4000 0000 0000  ..>.......@.....
0000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
0000030: 0000 0000 4000 3800 0100 4000 0000 0000  ....@.8...@.....
0000040: 0100 0000 0500 0000 0000 0000 0000 0000  ................
0000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
0000060: 0b01 0000 0000 0000 0b01 0000 0000 0000  ................
0000070: 0000 2000 0000 0000 0000 0000 0000 0000  .. .............
0000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000b0: 0400 0000 1400 0000 0300 0000 474e 5500  ............GNU.
00000c0: c3b0 cbbd 0abf a73c 26ef e960 fc64 4026  .......<&..`.d@&
00000d0: e242 8bc7 48c7 c001 0000 0048 c7c7 0100  .B..H......H....
00000e0: 0000 48c7 c6fe 0040 0048 c7c2 0d00 0000  ..H....@.H......
00000f0: 0f05 48c7 c03c 0000 0048 31ff 0f05 4865  ..H..<...H1...He
0000100: 6c6c 6f2c 2057 6f72 6c64 0a              llo, World.

It has one program header (LOAD) and no sections:

There are 1 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000010b 0x000000000000010b  R E    200000

This seems to load the entire file (file offset 0 thru 0x10b - elf header and all) at address 0x400000.

The entry point is:

 Entry point address:               0x4000d4

Which corresponds to 0xd4 offset in the file, and as we can see that address is the start of the machine code (mov $1, %rax1)

My question is why (how) did the gnu linker choose address 0x400000 to map the file to?

Andrew Tomazos
  • 62,609
  • 36
  • 171
  • 294

2 Answers2

11

The start address is usually set by a linker script.

For example, on GNU/Linux, looking at /usr/lib/ldscripts/elf_x86_64.x we see:

...
PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); \
    . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;

The value 0x400000 is the default value for the SEGMENT_START() function on this platform.

You can find out more about linker scripts by browsing the linker manual:

% info ld Scripts
jkoshy
  • 1,763
  • 16
  • 23
  • 5
    Yeah, but why 0x400000 in particular, and not some other value? – BarbaraKwarc Jul 04 '18 at 13:40
  • 1
    @BarbaraKwarc: It has to be above `mmap_min_addr = 65536` (https://wiki.debian.org/mmap_min_addr). And 2M-aligned is good: at the start of a page-directory in the next level up of the page tables means the same number of page-table-entries will be split across fewer page directory entries, saving kernel page-table memory and helping page-walk hardware cache better. IDK why 4MiB instead of 2MiB, or if there were other factors in this choice. – Peter Cordes Nov 26 '20 at 10:09
  • 1
    Better yet, use `ld --verbose`. – Jester Nov 27 '20 at 03:08
1

Page zero of task's virtual address space is kept unmapped so that null-pointer references could be catched through page-fault exception leading to SIGSEGV. 4 MB fit with "big page" granularity (as opposed to "normal page" granularity 4 KB) - so on settings with 4 MB page granularity, 0x000000 to 0x3FFFFF address range is unmapped, making 0x400000 the first valid address in task's virtual address space.

  • 1
    The default for `/proc/sys/vm/mmap_min_addr` is 64kiB on x86-64 Linux, so you can map stuff at lower addresses if you want. (See https://wiki.debian.org/mmap_min_addr). Some of this idea makes some sense, though: 0x400000 is hugepage-aligned which can be good for transparent-hugepages. (But note that x86-64 hugepages are 2MiB and 1GiB; only the legacy 32-bit page-tables have 4M hugepages. And `ld` has a different default base address for i386: `0x08048000`, somewhat above 128.28 MiB) – Peter Cordes Nov 26 '20 at 10:01
  • 1
    Choosing an address that's 2M aligned and near the start of a 1G region means a better chance of denser page tables, i.e. same number of PTEs split between fewer page directories. The memory in the low 4MiB is unlikely to get randomly mapped, though, so usually you do have that full 4MiB buffer against null-deref. – Peter Cordes Nov 26 '20 at 10:03
  • AFAIK, i386 code base 0x8000000 was used so that stack section could be placed below this base address. This leads to implicit guard mechanisms - TOS overflow into unmapped zeroth page throws an exception, as well as TOS underflow into code section which is read-only by default, thus throwing an exception, too. And 128 MB was chosen to provide enough room for stack section. To be honest, I'm in doubt about plausibility of that approach as the same effect could be achieved no matter where in virtual address space by surrounding the stack region with unmapped pages. – Notorius Maximus Nov 26 '20 at 15:26
  • 1
    That's not where current Linux puts the user-space stack. Are you suggesting the original design was totally different, and didn't respect `ulimit -s` for the stack growth limit? The current mechanism actually grows the *mapping* (for the initial / main-thread stack), not just soft page-faulting in new physical pages for an existing logical mapping. (You can see that in `/proc//maps`, and that it won't grow if ESP / RSP isn't below the address accessed. See [How is Stack memory allocated when using 'push' or 'sub' x86 instructions?](//stackoverflow.com/q/46790666) for some details) – Peter Cordes Nov 26 '20 at 16:15
  • 1
    (Current Linux puts the user-space stack at or near the top of user-space virtual address space. Under a 64-bit kernel, 32-bit processes get a stack address like `0xffff....` or something like that, else `0x7ffff000` or so. – Peter Cordes Nov 26 '20 at 16:19
  • I know it works the way you mentioned. I just wrote what I had learned from someone else. Who knows whether there originally was some intent to do stuff that way or there was some different reason why to reserve some space at the beginning of virtual address space. – Notorius Maximus Nov 26 '20 at 18:02