The behavior of __CUDA_ARCH__ macro

Question

In the host code, it seems that the __CUDA_ARCH__ macro wont generate different code path, instead, it will generate code for exact the code path for the current device.

However, if __CUDA_ARCH__ were within device code, it will generate different code path for different devices specified in compiliation options (/arch).

Can anyone confirm this is correct?

Robert Crovella · Accepted Answer · 2019-09-18T15:38:22.307

__CUDA_ARCH__ when used in device code will carry a number defined to it that reflects the code architecture currently being compiled.

It is not intended to be used in host code. From the nvcc manual:

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

Usage of __CUDA_ARCH__ in host code is therefore undefined (at least by CUDA). As pointed out by @tera in the comments, since the macro is undefined in host code, it could be used to differentiate host/device paths for example, in a __host__ __device__ function definition.

#ifndef __CUDA_ARCH__
//host code here
#else
//device code here
#endif

Actually, `__CUDA_ARCH__` is not even defined in host code (which can be used to distinguish between host and device compilation). Which is why it seems the tests in host code always appear to evaluate the same way. — tera, Apr 18 '13 at 11:20

The behavior of __CUDA_ARCH__ macro

1 Answers1

Linked