-1

I have some C code like this:

#include <stdint.h>

typedef struct {
    int16_t x1;
    int16_t x2;
} XSTATE;

typedef struct {
    int16_t s1;
    int32_t s2;
    XSTATE x;
} STATE;

typedef int16_t (*FUNCTION_HANDLE_S16)(void *, int16_t arg);

// really lives in foo.h
inline int16_t foo(void *pxstate_raw, int16_t arg)
{
    XSTATE *pxstate = pxstate_raw;
    pxstate->x1 = arg;
    return arg;
}

// really lives in bar.h 
inline int16_t bar(STATE *pstate, FUNCTION_HANDLE_S16 f)
{
    return f(&pstate->x, 1991);
}

int16_t test(STATE *pstate)
{
    return bar(pstate, foo);
}

If I check the output in Compiler Explorer using gcc 12.1, I see this using -O1:

test:
        mov     WORD PTR [rdi+8], 1991
        mov     eax, 1991
        ret

Perfect! pstate->x.x1 is being assigned the value 1991 and that value is returned to the caller.

But in -O0:

test:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        mov     esi, OFFSET FLAT:foo
        mov     rdi, rax
        call    bar
        leave
        ret

What gives? What happened to the 1991 constant, and the compiled output of bar? Only test shows up in the compiler output.

Versions of gcc since 5.1 don't produce output for bar but gcc 4.9.4 does produce output for bar.

Here's clang 14.0.0 with -O1:

test:                                   # @test
        mov     word ptr [rdi + 8], 1991
        mov     ax, 1991
        ret

and -O0:

        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     qword ptr [rbp - 8], rdi
        mov     rdi, qword ptr [rbp - 8]
        movabs  rsi, offset foo
        call    bar
        cwde
        add     rsp, 16
        pop     rbp
        ret
Peter Cordes
  • 286,368
  • 41
  • 520
  • 731
Jason S
  • 178,603
  • 161
  • 580
  • 939
  • Because `test` calls `bar` and `bar` calls some function with 1991 as an argument. – Jason S May 20 '22 at 20:58
  • Right, I get that. But the output of the compiler does not produce any implementation of `bar`, just of `test`. Earlier versions of gcc emit both. – Jason S May 20 '22 at 21:01
  • 1
    That's how C works; you need to put `extern inline int16_t bar(STATE *pstate, FUNCTION_HANDLE_S16 f);` in exactly one `.c` file. `gcc` (as opposed to `g++`) used to have different defaults, and put inline functions in a "common" section for duplicates to be removed at link time. – Peter Cordes May 20 '22 at 21:05
  • but... but... why does the compiler throw away my definition for `bar` ? – Jason S May 20 '22 at 21:06
  • Why are you interested in -O0 output if you are optimizing? – Peter - Reinstate Monica May 20 '22 at 21:07
  • 3
    Because you told it (via `inline ... bar()`) that it doesn't need to emit a definition in this compilation unit; some other one will have an `extern inline ... bar()` to make sure there is one to link against. But then you didn't do that. Near duplicate of [Is "inline" without "static" or "extern" ever useful in C99?](https://stackoverflow.com/q/6312597) and [What happens with an extern inline function?](https://stackoverflow.com/q/17504316). The idea is to save the compiler work emitting duplicate defs, and to allow dumber linkers, I think (which don't need to discard duplicate defs.) – Peter Cordes May 20 '22 at 21:08
  • "Why are you interested in -O0 output if you are optimizing?" because it is an odd behavior that I don't expect. – Jason S May 20 '22 at 21:08
  • @PeterCordes ah that's it, and yes, it is a duplicate. I didn't realize that `inline` had this implication, and that you need `inline static` to fix it. – Jason S May 20 '22 at 21:10
  • 1
    BTW, it doesn't in C++, only in C. And historically compilers picked the "friendly" implementation strategy that matched C++. The other way to fix it is `__attribute__((always_inline))`, or (potentially worse) `static inline` so you can get multiple separate non-inlined copies of the function in your program, although usually only in a debug build unless your compiler disagrees with you about the wisdom of actually inlining a certain function. – Peter Cordes May 20 '22 at 21:10
  • why is `__attribute__((always_inline))` better than `static inline`? – Jason S May 20 '22 at 21:14
  • Also heavily related: [Different compilation results not using extern in C vs in C++](https://stackoverflow.com/q/49511510) re: GCC `-fno-common` for global vars. – Peter Cordes May 20 '22 at 21:15
  • `always_inline` won't ever leave a non-inlined version of the function anywhere. On second thought, in a debug build it might actually be better (less code bloat) to use `static inline`, instead of bloating every call site in a function used multiple times in one file, unless it's truly tiny. Inlining without optimization does *not* optimize away the the arg variables so you end up with quite a few instructions at a call site, plus the function body. – Peter Cordes May 20 '22 at 21:23
  • In an optimized build, which is all that really matters, if a compiler decides not to inline, you'd rather have all calls go to one stand-alone definition, which is what you get from using `extern inline` to instantiate one. (Hmm, now I wonder if that's maybe the worse way for the normal case where all the calls do inline, if you still end up with a non-inline instance of the function in your program.) – Peter Cordes May 20 '22 at 21:23
  • 1
    Anyway, `always_inline` overrides the compiler's heuristic and forces it; if the function then optimizes down to not much code because of a couple args being constants or other reasons, then the heuristic was wrong and you were right to override it. – Peter Cordes May 20 '22 at 21:23
  • I'm used to using `inline static` but only in .h files to avoid multiple instances from colliding. I had never realized `inline` without `static` could be problematic in a .c file. – Jason S May 20 '22 at 22:06

0 Answers0