0

I am struggling the intel intrinsics, and come up with the following assumption.

If we have four instructions to execute. The CPI's of them are all 0.5. Then if there is NOT any dependency among them, the lowest number of cycles to execute them is 2, since every two of them can feed one cycle.

Am I correct?

Peter Cordes
  • 286,368
  • 41
  • 520
  • 731
LI.LE
  • 71
  • 4
  • 2
    Only if the instructions all compete for the *same* two ports (assuming each is a single uop for 1 port as the reason for being 2/c). Otherwise they could run potentially all run in the same cycle if it's two opposite pairs of execution ports, like p1 or p5 vs. p0 / p6 for `shlx` or other scalar shift (p06) vs. `bzhi` etc (p15). Intel's tables of throughput and latency are insufficient to answer the question of total throughput of a whole block of code, which is why I never look at them, instead consulting https://uops.info/ and https://agner.org/optimize/. See the linked duplicate for more. – Peter Cordes Jan 21 '22 at 05:50

0 Answers0