XMMatrixPerspectiveFovLH is broken?

Question

I've been trying to build a simple DirectX renderer from the ground up (was an OpenGL guy before but i figure it's good to know different APIs). I've been following the guide at rastertek.com to learn how to get the basics up and running.

I got as far as this lesson, following the guide verbatim except for variable names, and ran into a really weird bug. The renderer class contains 3 XMMatrix instances, world, perspective and orthographic, but trying to initialize them would crash my application - it ran fine once i commented out these lines :

m_projMat = XMMatrixPerspectiveFovLH(fov, aspect, nearplane, farplane);
m_orthoMat = XMMatrixOrthographicLH(width, height, nearplane, farplane);
m_worldMat = XMMatrixIdentity();

At first i thought it was the dreaded SIMD alignment issue, but I'm compiling 64-bit so that should be taken care of. After a bit more tinkering i found out the ortho and world matrices were fine, it was just the projection matrix that was causing problems. Assigning to this matrix with the other XMMatrix... functions also works ok. So:

m_projMat = XMMatrixPerspectiveFovLH(fov, aspect, nearplane, farplane);

fails, but:

m_projMat = XMMatrixPerspectiveLH(fov*aspect*nearplane, fov*nearplane, nearplane, farplane);

works! The right-handed version also fails. Also, the failure only happens in release mode, not in debug. I'm using Visual Studio 2015 community and the SDKs that installed with it.

Has anyone else encountered this bug, or have any idea what's going on here?

You could try using the latest version of DirectXMath available at Github and see if the problem persists. — Matthias, Sep 29 '17 at 16:37
Is there actually some info at compile or runtime about the crash? — Matthias, Sep 29 '17 at 16:39
Yeah thanks, I've got it working now. I upgraded to the latest version of directx math and copied the source directly from Rastertek (second series). Still confused about what the bug actually was... Not alignment, AFAICT, since compiling 64-bit forces 16-byte heap alignment. And the other matrix methods worked fine and produced valid results. Possibly a subtle bug in my own copy of the code. Going forward, I think I'll work with Microsoft's own DirectX tutorials since these seem to work well out of the box with a minimum of hassle. — russ, Oct 31 '17 at 04:06

Matthias · Accepted Answer · 2017-10-01T18:33:39.773

Problem

XMMATRIX and XMVECTOR (which use __m128 under the hood) require 16 byte alignment. The C++ compiler can automatically ensure this for XMMATRIX and XMVECTOR data allocated on the stack, but can't ensure this for XMMATRIX and XMVECTOR data allocated on the heap.

In your specific situation (referring to the RasterTek tutorial), three member variables of your D3DClass class are declared to be of type XMMATRIX and you construct one instance of this D3DClass class on the heap by using the default operator new (due to the absence of a custom one). Using the default operator new does not guarantee this 16 byte alignment. Even if your code would have run fine using the default operator new, you could still face this issue after adding some extra member variables in the future.

Solution 1

The most easy fix in this specific case, is to allocate your D3DClass class instances on the stack instead of on the heap. This, however, does not prevent you or future users of your code to allocate an instance on the stack instead.

Solution2

The programmers of the DirectXMath library advise the general programmer audience to use XMMATRIX and XMVECTOR for calculations only, and to not use them as containers for storing persistent data (i.e. class member variables). For the latter, they advise to use XMFLOAT4, XMFLOAT4x4, etc. The latter structs do not impose the 16 byte alignment restriction and their instances can simply be loaded in SIMD registers by using XMLoadFloat4 for XMVECTOR, XMLoadFloat4x4 for XMMATRIX, etc.

However, often it is easier and more compact to avoid using XMVECTOR or XMMATRIX directly in a class or structure. Instead, make use of the XMFLOAT3, XMFLOAT4, XMFLOAT4X3, XMFLOAT4X4, and so on, as members of your structure. Further, you can use the Vector Loading and Vector Storage functions to move the data efficiently into XMVECTOR or XMMATRIX local variables, perform computations, and store the results. There are also streaming functions (XMVector3TransformStream, XMVector4TransformStream, and so on) that efficiently operate directly on arrays of these data types.

Solution 3

Ensure the 16 byte alignment by overwriting operator new/operator delete, operator new[]/operator delete[]. We declare a template class which does allow heap allocation with arbitrary alignment restrictions:

#include <malloc.h>

inline void *AllocAligned(size_t size, size_t alignment = 16) noexcept {
    return _aligned_malloc(size, alignment);
}

inline void FreeAligned(void *ptr) noexcept {
    if (!ptr) {
        return;
    }

    _aligned_free(ptr);
}

template< typename DataT >
struct AlignedData {

public:

    static void *operator new(size_t size) {
        const size_t alignment = __alignof(DataT);

        // __declspec(align) on DataT is required
        static_assert(alignment > 8, 
            "AlignedData is only useful for types with > 8 byte alignment.");

        void * const ptr = AllocAligned(size, alignment);
        if (!ptr) {
            throw std::bad_alloc();
        }

        return ptr;
    }

    static void operator delete(void *ptr) noexcept {
        FreeAligned(ptr);
    }

    static void *operator new[](size_t size) {
        return operator new(size);
    }

    static void operator delete[](void *ptr) noexcept {
        operator delete(ptr);
    }
};

This code snippet is based on the DirectXTK and is also included in my own codebase (including documentation).

This data structure can now be used as follows:

__declspec(align(16)) struct Transform final : public AlignedData< Transform > {
    XMMATRIX m_transform;
}

A Transform structure can now be allocated on the heap with 16 byte alignment, since our custom operator new will now be invoked:

Transform *transform = new Transform();

If you use instances of the Transform struct as member variables in other structs of classes, the alignment restrictions remain of course. So instead of having only alignment restrictions for XMVECTOR and XMMATRIX, you will have now alignment restrictions for XMVECTOR, XMMATRIX and Transform.

Smart Pointer Pitfalls

Be careful if you use std::shared_ptr and especially std::make_shared. The latter allocates the data block together with the control block without using customly defined allocators such as our custom operator new.

This means that this method will not work:

template< typename T, typename... ConstructorArgsT >
inline std::shared_ptr< T > MakeShared(ConstructorArgsT&&... args) {
    return std::make_shared< T >(std::forward< ConstructorArgsT >(args)...);
}

but you should rather use an explicit invocation of operator new (which is also the case for std::make_unique):

template< typename T, typename... ConstructorArgsT >
inline std::shared_ptr< T > MakeAllocatedShared(ConstructorArgsT&&... args) {
    return std::shared_ptr< T >(new T(std::forward< ConstructorArgsT >(args)...));
}

std::unique_ptr and especially std::make_unique will work as intended since the latter explicitly calls operator new (due to the absence of a control block).

RasterTek

RasterTek's second series of D3D11 tutorials is a subset of RasterTek's first series of D3D11 tutorials with the minor difference that the latter uses the obsolete D3DXMath. The transition is from D3DXMath to DirectXMath is actually pretty straightforward. Therefore, it can be interesting to take a look at the first series as well (which I ported to Visual Studio 2017, run fine in both Debug and Release configurations, and can be founded as well in this repository).

If you want to stick to DirectXMath, you can also take a look at the D3D11 tutorials of Microsoft itself (explanation + code). (I also created a repository with some code refactors of these tutorials).

You can, however, make smart pointers work by specializing std::allocator for any aligned data type. This will also allow you to put your stuff into standard library containers, which is not a particularly rare use-case. Basically whenever you overload new/delete you should always remember to make your own allocator (or better, specialize the standard allocator). But Solution 2 might still be the most headache-free anyway. — Christian Rau, Oct 01 '17 at 20:37

XMMatrixPerspectiveFovLH is broken?

1 Answers1