1

On the website of GNU there is a simple example available which is supposed to demonstrate the problems appearing with non-atomic access. The example contains a small mistake, they have forgotten #include <unistd.h>:

#include <signal.h>
#include <stdio.h>
#include <unistd.h>

struct two_words { int a, b; } memory;

static struct two_words zeros = { 0, 0 }, ones = { 1, 1 };

void handler(int signum)
{
   printf ("%d,%d\n", memory.a, memory.b);
   alarm (1);
}

int main (void)
{
   signal (SIGALRM, handler);
   memory = zeros;
   alarm (1);
   while (1)
     {
    memory = zeros;
    memory = ones;
     }
}

The idea is that the assignment memory = zeros; or the memory = ones; takes multiple cycles and thus the interrupt handler will be able to print "0 1" or "1 0" at some point in time.

However, interestingly for the x86-64 architecture, the assembly code produced by the gcc compiler looks as follows. It appears that the assignment is done within one single cycle by the movq instruction:

    .file   "interrupt_handler.c"
    .text
    .comm   memory,8,8
    .local  zeros
    .comm   zeros,8,8
    .data
    .align 8
    .type   ones, @object
    .size   ones, 8
ones:
    .long   1
    .long   1
    .section    .rodata
.LC0:
    .string "%d,%d\n"
    .text
    .globl  handler
    .type   handler, @function
handler:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    movl    4+memory(%rip), %edx
    movl    memory(%rip), %eax
    movl    %eax, %esi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    movl    $1, %edi
    call    alarm@PLT
    nop
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   handler, .-handler
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    leaq    handler(%rip), %rsi
    movl    $14, %edi
    call    signal@PLT
    movq    zeros(%rip), %rax
    movq    %rax, memory(%rip)
    movl    $1, %edi
    call    alarm@PLT
.L3:
    movq    zeros(%rip), %rax
    movq    %rax, memory(%rip)
    movq    ones(%rip), %rax
    movq    %rax, memory(%rip)
    jmp .L3
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 7.3.0-16ubuntu3) 7.3.0"
    .section    .note.GNU-stack,"",@progbits

Can someone explain how it is possible that two different assignments are done within a single cycle? Because I would think that the assignment of two different ints has to happen to two different pieces of memory, but somehow it seems over here that they are written to the same place.

This example changes when instead of int, I would use double. The while loop in assembly becomes:

.L3:
    movq    zeros(%rip), %rax
    movq    8+zeros(%rip), %rdx
    movq    %rax, memory(%rip)
    movq    %rdx, 8+memory(%rip)
    movq    ones(%rip), %rax
    movq    8+ones(%rip), %rdx
    movq    %rax, memory(%rip)
    movq    %rdx, 8+memory(%rip)
    jmp .L3
MasterMind
  • 45
  • 6
  • 2
    it's not because it happened to work on a machine that it would always work... here you clearly had "luck" that gcc saw that your structure can fit a register and can be set in one 64bit operation. This does not mean it would always be the case on every arch nor every struct. – OznOg Oct 17 '18 at 16:35
  • 1
    This example was written a long time ago (in the history of computers), when machines with 64-bit load and store were much less common than they are today. Fatih, would you mind filing a bug report on the glibc documentation (at https://sourceware.org/bugzilla/ ) so we remember to correct it? – zwol Oct 17 '18 at 16:43
  • 1
    By the way `movq` is only required to be atomic when it does not cross a cache line boundary, so just because it is used does not prove that the assignment is atomic now – harold Oct 17 '18 at 17:16
  • Notice that in `handler`, the two halves are loaded separately. If another thread was writing the struct between reads, you could get tearing, but not from a signal handler in the same thread. Also note that this code depends on being compiled without optimization, making every variable effectively `volatile`. If you compile it with optimization, you get an empty infinite loop. https://godbolt.org/z/d6V42Z. Anyway, compile with `-m32 -mno-sse` to stop the compiler from using 64-bit stores for the assignment. – Peter Cordes Oct 17 '18 at 17:40
  • Thnx for all the commentary. My question was more related to how it is possible that two ints can be loaded and stored within one instruction in the x86-64 architecture. It is indeed true that if you compile this for other (older) machines the bug/error will still occur. Also zwol suggested to fill in a bug report. I would be happy to do this, but I do not get what this is for? Is the glibc documentation related to this example from GNU somehow? – MasterMind Oct 17 '18 at 21:49
  • The two ints are not just randomly in unrelated different memory locations, they are adjacent, which makes it possible to treat both of them together as a single entity in some sense. Similarly you could `memcpy` both of them together with a size of 8 which you may be more familiar with – harold Oct 18 '18 at 10:03
  • @Fatih You didn't say where on the "website of GNU" (by which I assume you mean https://www.gnu.org) you found this code, but I recognize it as the example code from . That page is part of the manual for the GNU C Library, "glibc" for short, and problems with that manual should be filed in the bug tracker I linked to earlier. (I am an occasional contributor to glibc.) – zwol Oct 18 '18 at 14:13

0 Answers0