0

I'm trying to trace how TensorFlow actually uses cuDNN to implement different operators. I'll use Tensorflow's conv2d operator as an example (tf.nn.conv2d).

As a reference to give me an idea of what I should be looking for, as well as how to implement a conv2d operator in cuDNN, I've been reading this blog post: Peter Goldsborough - Convolutions with cuDNN.

So based on this answer (ANSWER: Tensorflow: Where is tf.nn.conv2d Actually Executed?), Tensorflow will (roughly, I recognize there are some other branches that could be taken) call down this stack:

Now I assume (and someone please correct me if I'm wrong), that if we are correctly using TF with cuDNN, we will then be launching a LaunchConv2DOp<GPUDevice, T>::operator().

Towards the end of this operator implementation, around when they start defining a se::dnn::BatchDescriptor (see here), and later when they run LaunchAutotunedConv (see here), this is when I think they are basically making use of their higher abstraction levels, but eventually down these levels they interface with the cuDNN APIs.

Now I expected to find some sort of communication here between, for example, se::dnn::BatchDescriptor or LaunchAutotunedConv and either the cuDNN specific methods found in tensorflow/stream_executor/cuda/cuda_dnn.cc, or any of the auto-generated stub files that are used to wrap cuDNN APIs based on the cuDNN version (e.g., tensorflow/stream_executor/cuda/cudnn_8_0.inc. However, I can find no link between these 2 levels of abstraction.

Am I missing something? At what point does Tensorflow actually make calls to the cuDNN APIs from their C++ operator implementations?

Ben
  • 30
  • 1
  • 6

0 Answers0