How to allocate matrixes with double pointers in CUDA?

Question

I've recently tackled into learning CUDA and I'm having a problem understanding matrix allocation with CUDA. I've done some basic matrix multiplication and stuff and now the idea is to paralelize the rotation of matrixes.

Basically, i have this piece of serial code:

int** rot;
int** rot_0;
int** rot_1;

int** d_rot;
int** d_rot_0;
int** d_rot_1;

and in the main() function I have:

rot = (int**)malloc(sizeof(int*) * N);
rot_0 = (int**)malloc(sizeof(int*) * N);
rot_1 = (int**)malloc(sizeof(int*) * N);

for (int i = 0; i < N; i++) {
    rot[i] = (int*)malloc(sizeof(int) * N);
    rot_0[i] = (int*)malloc(sizeof(int) * N);
    rot_1[i] = (int*)malloc(sizeof(int) * N);
}

Now I'm trying to do the device mirror of the above:

cudaMalloc((int**) &d_rot, sizeof(int*) * N);
cudaMalloc((int**) &d_rot_0, sizeof(int*) * N);
cudaMalloc((int**) &d_rot_1, sizeof(int*) * N);

for (int i = 0; i < N; i++) {
     cudaMalloc((int*) &d_rot[i], sizeof(int) * N);
     cudaMalloc((int*) &d_rot_0[i], sizeof(int) * N);
     cudaMalloc((int*) &d_rot_1[i], sizeof(int) * N);
}

However, I'm receiving the error:

error: no instance of overloaded function "cudaMalloc" matches the argument list
argument types are: (int *, unsigned long)

If possible, can any of you try to tell me what I am doing wrong?

for starters, remove the `(int *)` casts. But since `d_rot` is a device pointer, then `&d_rot[i]` refers to a location in device memory. You cannot ask `cudaMalloc` to update a pointer value in device memory. For a general treatment of this topic, refer to [this](https://stackoverflow.com/questions/45643682/cuda-using-2d-and-3d-arrays/45644824#45644824). You are attempting to do the "general, dynamically allocated 2D case" — Robert Crovella, Oct 18 '21 at 22:24

How to allocate matrixes with double pointers in CUDA?

0 Answers0