Problem
XMMATRIX and XMVECTOR (which use __m128 under the hood) require 16 byte alignment. The C++ compiler can automatically ensure this for XMMATRIX and XMVECTOR data allocated on the stack, but can't ensure this for XMMATRIX and XMVECTOR data allocated on the heap.
In your specific situation (referring to the RasterTek tutorial), three member variables of your D3DClass class are declared to be of type XMMATRIX and you construct one instance of this D3DClass class on the heap by using the default operator new (due to the absence of a custom one). Using the default operator new does not guarantee this 16 byte alignment. Even if your code would have run fine using the default operator new, you could still face this issue after adding some extra member variables in the future.
Solution 1
The most easy fix in this specific case, is to allocate your D3DClass class instances on the stack instead of on the heap. This, however, does not prevent you or future users of your code to allocate an instance on the stack instead.
Solution2
The programmers of the DirectXMath library advise the general programmer audience to use XMMATRIX and XMVECTOR for calculations only, and to not use them as containers for storing persistent data (i.e. class member variables). For the latter, they advise to use XMFLOAT4, XMFLOAT4x4, etc. The latter structs do not impose the 16 byte alignment restriction and their instances can simply be loaded in SIMD registers by using XMLoadFloat4 for XMVECTOR, XMLoadFloat4x4 for XMMATRIX, etc.
However, often it is easier and more compact to avoid using XMVECTOR
or XMMATRIX directly in a class or structure. Instead, make use of the
XMFLOAT3, XMFLOAT4, XMFLOAT4X3, XMFLOAT4X4, and so on, as members of
your structure. Further, you can use the Vector Loading and Vector
Storage functions to move the data efficiently into XMVECTOR or
XMMATRIX local variables, perform computations, and store the results.
There are also streaming functions (XMVector3TransformStream,
XMVector4TransformStream, and so on) that efficiently operate directly
on arrays of these data types.
Solution 3
Ensure the 16 byte alignment by overwriting operator new/operator delete, operator new[]/operator delete[]. We declare a template class which does allow heap allocation with arbitrary alignment restrictions:
#include <malloc.h>
inline void *AllocAligned(size_t size, size_t alignment = 16) noexcept {
return _aligned_malloc(size, alignment);
}
inline void FreeAligned(void *ptr) noexcept {
if (!ptr) {
return;
}
_aligned_free(ptr);
}
template< typename DataT >
struct AlignedData {
public:
static void *operator new(size_t size) {
const size_t alignment = __alignof(DataT);
// __declspec(align) on DataT is required
static_assert(alignment > 8,
"AlignedData is only useful for types with > 8 byte alignment.");
void * const ptr = AllocAligned(size, alignment);
if (!ptr) {
throw std::bad_alloc();
}
return ptr;
}
static void operator delete(void *ptr) noexcept {
FreeAligned(ptr);
}
static void *operator new[](size_t size) {
return operator new(size);
}
static void operator delete[](void *ptr) noexcept {
operator delete(ptr);
}
};
This code snippet is based on the DirectXTK and is also included in my own codebase (including documentation).
This data structure can now be used as follows:
__declspec(align(16)) struct Transform final : public AlignedData< Transform > {
XMMATRIX m_transform;
}
A Transform structure can now be allocated on the heap with 16 byte alignment, since our custom operator new will now be invoked:
Transform *transform = new Transform();
If you use instances of the Transform struct as member variables in other structs of classes, the alignment restrictions remain of course. So instead of having only alignment restrictions for XMVECTOR and XMMATRIX, you will have now alignment restrictions for XMVECTOR, XMMATRIX and Transform.
Smart Pointer Pitfalls
Be careful if you use std::shared_ptr and especially std::make_shared. The latter allocates the data block together with the control block without using customly defined allocators such as our custom operator new.
This means that this method will not work:
template< typename T, typename... ConstructorArgsT >
inline std::shared_ptr< T > MakeShared(ConstructorArgsT&&... args) {
return std::make_shared< T >(std::forward< ConstructorArgsT >(args)...);
}
but you should rather use an explicit invocation of operator new (which is also the case for std::make_unique):
template< typename T, typename... ConstructorArgsT >
inline std::shared_ptr< T > MakeAllocatedShared(ConstructorArgsT&&... args) {
return std::shared_ptr< T >(new T(std::forward< ConstructorArgsT >(args)...));
}
std::unique_ptr and especially std::make_unique will work as intended since the latter explicitly calls operator new (due to the absence of a control block).
RasterTek
RasterTek's second series of D3D11 tutorials is a subset of RasterTek's first series of D3D11 tutorials with the minor difference that the latter uses the obsolete D3DXMath. The transition is from D3DXMath to DirectXMath is actually pretty straightforward. Therefore, it can be interesting to take a look at the first series as well (which I ported to Visual Studio 2017, run fine in both Debug and Release configurations, and can be founded as well in this repository).
If you want to stick to DirectXMath, you can also take a look at the D3D11 tutorials of Microsoft itself (explanation + code). (I also created a repository with some code refactors of these tutorials).