CUDA memory bank conflict

Question

I would like to be sure that I understand correctly bank conflicts in shared memory. I have 32 portions of data. These portions consists of 128 integers.

|0, 1, 2, ..., 125, 126, 127| ... |3968, 3969, 3970, ..., 4093, 4094, 4095|

Each thread in a warp access only it's own portion.

Thread 0 access position 0(0) in portion 0
Thread 1 access position 0(128) in portion 1
Thread 31 access position 0(3968) in portion 31

Does it mean that I have here 32 conflicts? If yes, then if I will stretch portions to 129 elements, then each thread will access unique bank. Am I right?

Does this answer your question? [What is a bank conflict? (Doing Cuda/OpenCL programming)](https://stackoverflow.com/questions/3841877/what-is-a-bank-conflict-doing-cuda-opencl-programming) — einpoklum, Sep 16 '21 at 18:16

Robert Crovella · Accepted Answer · 2021-09-11T14:06:05.933

Yes, you will have 32-way bank conflicts. For the purposes of bank conflicts, it may help to visualize shared memory as a two-dimensional array, whose width is 32 elements (e.g. 32 int or float quantities, for example). Each column in this 2D array is a "bank".

Overlay your storage pattern on that. When you do so, you will see that your stated access pattern will result in all threads in the warp will be requesting items from column 0.

Yes, the usual "trick" here is to pad the storage by 1 element per "row" (in your case this could be one element per "portion"). That should eliminate bank conflicts for your stated access pattern.

CUDA memory bank conflict

1 Answers1