5

In the host code, it seems that the __CUDA_ARCH__ macro wont generate different code path, instead, it will generate code for exact the code path for the current device.

However, if __CUDA_ARCH__ were within device code, it will generate different code path for different devices specified in compiliation options (/arch).

Can anyone confirm this is correct?

talonmies
  • 68,743
  • 34
  • 184
  • 258
user0002128
  • 2,645
  • 2
  • 21
  • 38

1 Answers1

12

__CUDA_ARCH__ when used in device code will carry a number defined to it that reflects the code architecture currently being compiled.

It is not intended to be used in host code. From the nvcc manual:

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

Usage of __CUDA_ARCH__ in host code is therefore undefined (at least by CUDA). As pointed out by @tera in the comments, since the macro is undefined in host code, it could be used to differentiate host/device paths for example, in a __host__ __device__ function definition.

#ifndef __CUDA_ARCH__
//host code here
#else
//device code here
#endif
Robert Crovella
  • 131,712
  • 9
  • 184
  • 228
  • 5
    Actually, `__CUDA_ARCH__` is not even defined in host code (which can be used to distinguish between host and device compilation). Which is why it seems the tests in host code always appear to evaluate the same way. – tera Apr 18 '13 at 11:20