4

I am currently experimenting with unbounded descriptor arrays and ultimately want to use bindless descriptors. I wrote a test application, that draws 100.000 simple object instances, where the transform for each object is pulled from a buffer array.

#pragma pack_matrix(row_major)

struct VertexData { float4 Position : SV_POSITION; };

struct VertexInput { float3 Position : POSITION; };

struct CameraData { float4x4 ViewProjection; };

struct InstanceData { float4x4 Transform; };

ConstantBuffer<CameraData> camera : register(b0, space0); StructuredBuffer<InstanceData> instanceBuffer[] : register(t0, space1);

VertexData main(in VertexInput input, uint id : SV_InstanceID) { VertexData vertex;

InstanceData instance = instanceBuffer[id].Load(0);
float4 position = mul(float4(input.Position, 1.0), instance.Transform);
vertex.Position = mul(position, camera.ViewProjection);

return vertex;

}

I am using DXC to compile this shader to both, SPIR-V and DXIL. Under Vulkan, the draw call takes roughly 2ms, however under D3D, it takes around 10ms. I tried eliminating the buffer access by hashing the instance id and generating the transform for each object this way and (as expected) the draw calls were equally fast. I then tried using a ByteAddressBuffer instead of a StructuredBuffer and it did not show any difference in Vulkan (which is expected, since they are essentially the same in SPIR-V), but in D3D performance dropped even further to around 40ms per draw call. Now I am wondering what could cause such behavior. I tried looking at the IL, but I can't really tell how this compares to each other.

What could cause this discrepancy? Is there any way to profile this? All debuggers I tried just showed me the GPU times, which I mentioned above, but not more. Also some NSight Graphics features are not supported on my GPU (1080 Ti), so I could not test this compeltely.

Carsten
  • 141
  • 4

1 Answers1

1

Are you sure the buffer isn't just defined as a host visible buffer? If you allocate it in CPU space it might have to transfer it a lot, I use ByteAddressBuffer a lot and it's not that slow for me. So maybe it's just not properly allocated.

  • Hi! Thanks for the answer. The buffer is allocated at the default heap (D3D12_HEAP_TYPE_DEFAULT). For reference: D3D12_RESOURCE_DIMENSION_BUFFER, Width = elements * elementSize, D3D12_TEXTURE_LAYOUT_ROW_MAJOR and DXGI_FORMAT_UNKNOWN... If I did not forget anything, since it's been a while. Still haven't figured this one out though! Also note that the slowdown only occurs under D3D, not under Vulkan (since afair host visible is Vk terminology). – Carsten Oct 04 '23 at 12:42