6

Question

I came across a leetcode question gas-station

However I found my code is slightly faster if I use if/else rather than if/continue.
Edit: after twiddling with the test case. I found the problem is continue
It seems that clang is taking continue seriously..? My guess is hat clang somehow tries to put the part before continue and loop checking together.

In short: I want to know:

  • How do I "persuade" clang to understand my implementation about continue like if/else? Because I don't want to change my style of coding.

Edited sidenote: branch-prediction tag is removed.

Relative question

Should I use return/continue statement instead of if-else? which seems that they should have nearly no difference.

Investigation

Edit: the demo shows that the problem is continue

I used a 10k data set to test the code and found that the problem is continue itself.
I put continue in the if's end of body and then the runtime is increased by 30% ~ 50% which is still surprising to me.
Live Demo
Difference in number: 11070 to 15060(w/ continue)

Edited on 2022/5/30: Sorry I found that my previous naive profiling mistakenly got optimized out of part of calculation... which causes the difference in order of 100... Bad demo

Profiling code

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>

using namespace std; // Yeah I know it's bad.
// input , output
class Solution {
public:
    int canCompleteCircuit(vector<int>& gas, vector<int>& cost) {
        if(gas.empty()){
            return -1;
        }
        int len = gas.size();
        // start at the last station, move start backward if we cannot finished the loop
        int start = len - 1;
        int tank = gas[start] - cost[start];
        int cur = 0; // from first station
        while(start != cur){ // other condition?
            if(tank < 0){
                // if not, move start backward, check the value
                // continue...start > 0;
                start--;
                tank += gas[start] - cost[start];
                // continue;
            }
            else{
                tank += gas[cur] - cost[cur];
                cur++;   
            }
        }
        //cout << tank ;
        if(tank < 0){
            return -1;    
        }
        return start;
    }
};
volatile int tryNoToBeOptimizedOut = 0;    
int main() {

    Solution s;
    vector<int> gas{...}; // 10k data here
    vector<int> cost{...}; 
    int count = 100; 
    vector<int> results;
    for(int i = 0; i <count; i++){
        // size_t rIndex = rand()%gas.size();
        // gas[rIndex] += i;
        gas[i] += i;
        auto start = std::chrono::system_clock::now();
        int ret = s.canCompleteCircuit(gas,cost);
        auto end = std::chrono::system_clock::now();
        auto elapsed = end - start;
        std::cout << elapsed.count() << '\n';
        tryNoToBeOptimizedOut = ret;
        results.push_back(ret);
    }   
    tryNoToBeOptimizedOut = results[rand()%results.size()];
   
    return 0;
}

old tests before edit

In the first place I checked the implementation in gcc and found that both versions are nearly identical in assembly.
Live Demo in gcc

I know that Leetcode uses clang 11 with O2. So I checked the difference in clang 11.
Live Demo in clang 11
Live Demo in clang 14 doesn't get better

The results are a bit different between them. It seems that cur++; is combined into while predicates in if/continue version.
I'm not sure why....Is it related to optimization policies?

My code:

    int canCompleteCircuit(vector<int>& gas, vector<int>& cost) {
        if(gas.empty()){
            return -1;
        }
        int len = gas.size();
        // start at the last station, move start backward if we cannot finished the loop
        int start = len - 1;
        int tank = gas[start] - cost[start];
        int cur = 0; // from first station
        while(start != cur){ // other condition?
            if(tank < 0){
               // if not, move start backward, check the value
                // continue...start > 0;
                start--;
                tank += gas[start] - cost[start];
                continue;
            }
            // else{
                tank += gas[cur] - cost[cur];
                cur++;   
                // continue; <-- If I put another continue there then the runtime is similar to if/else version. But is there a better way than adding conitnue manually?
            // }
        }
        if(tank < 0){
            return -1;    
        }
        return start;
    }

Assembly in if/else version

.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
        movsxd  rcx, ecx                #   `else` part
        mov     edx, dword ptr [r14 + 4*rcx]
        sub     edx, dword ptr [rbx + 4*rcx]
        add     ecx, 1
.LBB0_6:                                #   in Loop: Header=BB0_3 Depth=1
        add     eax, edx
        mov     edx, eax
        shr     edx, 31
        cmp     esi, ecx                # `while(start != cur){`
        je      .LBB0_7
.LBB0_3:                                # =>This Inner Loop Header: Depth=1
        test    dl, 1                   # `if(tank < 0){`
        je      .LBB0_5                 # go to else
        movsxd  rdi, esi
        add     esi, -1
        mov     edx, dword ptr [r14 + 4*rdi - 4]
        sub     edx, dword ptr [rbx + 4*rdi - 4]
        jmp     .LBB0_6

Assembly in if/continue version

.LBB0_3:                                # =>This Loop Header: Depth=1
        mov     r9, rdx
        mov     edx, r10d
        sub     edx, r9d
        add     rdx, 4
        movsxd  rdi, r9d
        lea     rsi, [r12 + 4*rdi]
        lea     rbx, [r14 + 4*rdi]
        xor     edi, edi
.LBB0_4:                                #   Parent Loop BB0_3 Depth=1
        test    cl, 1                   # `if` part
        jne     .LBB0_5
        add     eax, dword ptr [rbx + 4*rdi]  # after `if` statement
        sub     eax, dword ptr [rsi + 4*rdi]
        mov     ecx, eax
        shr     ecx, 31
        add     rdi, 1
        cmp     edx, edi
        jne     .LBB0_4
        jmp     .LBB0_7
.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
        add     eax, dword ptr [r14 + 4*r8 - 4]    # Body of `if`
        sub     eax, dword ptr [r12 + 4*r8 - 4]
        add     r8, -1
        mov     esi, r10d
        sub     esi, r9d
        add     esi, 3
        mov     ecx, eax
        shr     ecx, 31
        movsxd  rdx, r9d
        add     rdx, rdi
        add     r10d, -1
        cmp     esi, edi
        jne     .LBB0_3
Louis Go
  • 2,031
  • 2
  • 15
  • 26
  • 1
    Have you run this in a profiler and checked the branch prediction stats? – Goswin von Brederlow May 29 '22 at 03:00
  • @GoswinvonBrederlow Sorry, I didn't do that. Also I didn't get all the test cases on leetcode. Do you mean that my assembly result depends on the input? – Louis Go May 29 '22 at 03:02
  • clang is an ahead-of-time C++ compiler. The asm depends only on the source code, *unless* you enable profile-guided optimization. In that case then yes, the compiler will know which branches are likely/unlikely and optimize accordingly, and that can depend on the input. Otherwise, the compiler wasn't involved in any step after the first run of the program. – Peter Cordes May 29 '22 at 03:24
  • 2
    It does sometimes happen that different ways of writing equivalent logic end up leading the compiler to make different asm. Either because of missed optimizations, or because of different heuristics for it guessing which path of execution would be more likely. (Perhaps leading to some other missed opt.) If something is performance critical, and you're sure you know which way a branch will usually go, `__builtin_expect` can sometimes help: [How do the likely/unlikely macros in the Linux kernel work and what is their benefit?](https://stackoverflow.com/a/31133787) – Peter Cordes May 29 '22 at 03:25
  • 3
    This doesn't look like anything inherently linked to if/else vs if/continue to me, but just that re-rolling the compiler dice worked out poorly for the second case and Clang randomly got more confused and decides to recompute `start` from scratch (most of the stuff in `LBB0_5` of the second snippet does that) every time it decrements it. – harold May 29 '22 at 03:33
  • @harold Thanks to your hint. I found that If I put another `continue` in the end of while loop. Then the runtime would be similar to `if/else` one. But I still think it's related to `if/continue` since this kind of scenario occurs most of time when I use `if/continue`. I have found this phenomenon on multiple leetcode question instead of just this one. – Louis Go May 29 '22 at 03:40
  • You tagged it branch-prediction. So you better measure the prediction. – Goswin von Brederlow May 29 '22 at 04:10
  • 1
    Never ever trust the timing output from leetcode. (in fact it is a very bad resource for learning how to write good and/or optimized c++ code). The system is shared with a lot of users so the deviation from average is pretty big. To optimize C++ use a local compiler and enable profiling. For if statements make them predicatable that will get you the best speed optimization. – Pepijn Kramer May 29 '22 at 06:30
  • 3
    This doesn't answer the question, but you can make this branchless by using `auto q = tank < 0 ? start-- : cur++; tank += gas[q] - cost[q];` – Aki Suihkonen May 29 '22 at 08:06
  • 2
    @AkiSuihkonen: Interesting idea, yeah that does compile branchlessly using `sar reg,31` to materialize a 0 or `-1`. https://godbolt.org/z/frW5Mbha6 (right-hand pane). Pretty sure it's doing something silly like `(~tank)>>31` instead of materializing the opposite value with a `not` on the shift result, though. Anyway, probably a lot faster if frequent backtracking is required so it doesn't predict well, probably slower for inputs where it can just blaze through the `tank >= 0` cases without switching to the other loop or branch. – Peter Cordes May 29 '22 at 08:56
  • @AkiSuihkonen: I was hoping clang for AArch64 could use `csinc`, but no, `add` with `lsr` and `add` with `asr` to add 1 or add -1 respectively, so I guess that's at least as good. https://godbolt.org/z/dq9MrneYz – Peter Cordes May 29 '22 at 09:05
  • 2
    One can squeeze out a few instructions also with `auto q = tank < 0 ? 0 : start; --start; gas += tank >= 0; cost += tank >=0;` as in https://godbolt.org/z/8vdbvWEYe. Also one could subtract the vectors in advance ( or interleave them). – Aki Suihkonen May 29 '22 at 13:18

0 Answers0