0
void    v128_vf_add(size_t v_size, float *v, float f)
{
    __m128  f_vector;
    float   *ptr;
    size_t  i;

    f_vector = _mm_set1_ps(f);

    i = 0;
    while (i < v_size / v128_femax)
    {
        ptr = v + (i * v128_femax);
        _mm_store_ps(ptr, _mm_add_ps(_mm_load_ps(ptr), f_vector));
        i++;
    }

    if (i * v128_femax < v_size)
    {
        ptr = v + (i * v128_femax);
        _mm_store_ps(ptr, _mm_add_ps(_mm_load_ps(ptr), f_vector));
    }
}

I am trying to run an addition operations on an array of size size_t vsize of floats denoted with float *v and a floating point number float f.

When the array is perfectly align with the AVX register (in my case 128-bits AVX register) everything works fine. The problem is when the array is offset by say an element or two I get a stack smashing error.

The code is simple, I loop over the array and load 4 elements to the register, do the addition and stores it back to the array float *v. The purpose of if statement is basically to prevent stack smashing by not loading/storing from an unaligned pointer, but I can't figure out how to do it.

I don't know if there is a way to load/operate/store on unaligned array (say the register if 4 floats wide but you have an array of size 3 for example).

aihya
  • 123
  • 1
  • 8
  • 1
    stack smashing?? Not segfault or something from misaligned access? It sounds like you're actually talking about a size that isn't a multiple of 4, rather than the address of the first element not being a multiple of 16. Related: stuff mentioned in [Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? Or not using that insn at all](https://stackoverflow.com/q/34306933) is relevant for odd-sized first or last, instead of for reaching an alignment boundary. Although mostly only vmaskmovps itself; other stuff like overlapping stores need larger arr – Peter Cordes Jan 28 '22 at 02:29

0 Answers0