I've recently tackled into learning CUDA and I'm having a problem understanding matrix allocation with CUDA. I've done some basic matrix multiplication and stuff and now the idea is to paralelize the rotation of matrixes.
Basically, i have this piece of serial code:
int** rot;
int** rot_0;
int** rot_1;
int** d_rot;
int** d_rot_0;
int** d_rot_1;
and in the main() function I have:
rot = (int**)malloc(sizeof(int*) * N);
rot_0 = (int**)malloc(sizeof(int*) * N);
rot_1 = (int**)malloc(sizeof(int*) * N);
for (int i = 0; i < N; i++) {
rot[i] = (int*)malloc(sizeof(int) * N);
rot_0[i] = (int*)malloc(sizeof(int) * N);
rot_1[i] = (int*)malloc(sizeof(int) * N);
}
Now I'm trying to do the device mirror of the above:
cudaMalloc((int**) &d_rot, sizeof(int*) * N);
cudaMalloc((int**) &d_rot_0, sizeof(int*) * N);
cudaMalloc((int**) &d_rot_1, sizeof(int*) * N);
for (int i = 0; i < N; i++) {
cudaMalloc((int*) &d_rot[i], sizeof(int) * N);
cudaMalloc((int*) &d_rot_0[i], sizeof(int) * N);
cudaMalloc((int*) &d_rot_1[i], sizeof(int) * N);
}
However, I'm receiving the error:
error: no instance of overloaded function "cudaMalloc" matches the argument list
argument types are: (int *, unsigned long)
If possible, can any of you try to tell me what I am doing wrong?