Drawing "after" Present to achieve better performance

Question

I had an application that was originally single threaded and worked as follows:

gather the items to be drawn (occlusion / frustrum culling / sorting into batches)
draw items using an immediate context
present

I decided to use a deferred context and parallelize step 2 as follows:

gather the items to be drawn (occlusion / frustrum culling / sorting into batches)
draw items in parallel used a deferred context
execute the command lists for step 2)
present

However, I saw almost no performance benefit. I finally decided move the "execute command lists" after present:

gather the items to be drawn (occlusion / frustrum culling / sorting into batches)
draw items in parallel used a deferred context
present (is actually presenting the previous frame's scene now)
execute the command lists

I saw a huge increase in performance with this method. My theory as to why is this worked is that the are basically 3 bottlenecks

graphics card (triggered by present)
my app's cpu thread
device driver cpu thread (triggered by executing the command lists)

In my original order, each successive stage was blocked by the previous one to be finished. However now, I believe that all three are running in parallel:

present with (frame - 2) data
device driver thread with (frame - 1) data
my cpu thread to run with this frame's data

My question is, is this a common pattern or is there some better way to achieve maximum parallization while using deferred context rendering?

This is a common pattern. You can see a more elaborate version of it in this presentation: https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine — RichieSams, Mar 20 '18 at 20:51

Drawing "after" Present to achieve better performance

0 Answers0