4

What is the best way to multiply each 32bit entry of two _mm256i registers with each other?

_mm256_mul_epu32 is not what I'm looking for because it produces 64bit outputs. I want a 32bit result for every 32bit input element.

Moreover, I'm sure that the multiplication of two 32bit values will not overflow.

Thanks!

Peter Cordes
  • 286,368
  • 41
  • 520
  • 731
user1829358
  • 991
  • 2
  • 8
  • 18
  • Possible duplicate of [fastest way to multiply two vectors in c++](http://stackoverflow.com/questions/17264399/fastest-way-to-multiply-two-vectors-in-c) – Peter Cordes Jun 09 '16 at 06:18

1 Answers1

6

You want the _mm256_mullo_epi32() intrinsic. From Intel's excellent online intrinsics guide:

Synopsis

__m256i _mm256_mullo_epi32 (__m256i a, __m256i b)
#include "immintrin.h" 
Instruction: vpmulld ymm, ymm, ymm CPUID Flags: AVX2 

Description

Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.

Jason R
  • 10,680
  • 5
  • 47
  • 75