Forum Discussion
lamour42
10 years agoExpert Protege
Free DX 12 Rift Engine Code
Hi, if you want to write code for the Rift using DirectX 12 you might want to take a look at the code I provided on GitHub https://github.com/ClemensX/ShadedPath12.git The sample engine is ext...
galopin
9 years agoHeroic Explorer
Overhead to render in dx12 ? it is because you do not think GPU :)
Yes, pushing on the CPU can be light speed compared to DX11 while using it in an old cpu fashion way, but the real force is to do things differently, more stream lined to the GPU.
I broke my oculus mode right now, still the screenshots are 256K objects, with per instance texture, in a couple of dispatches, one ExecuteIndirect that contains 4096 draws when no culling no occlusion is performed ( to emulate a collection of different objects, should be one ExecuteIndirect per PSO, my commands are made of one index buffer, two vertex buffers and a draw instanced ) and a few draw calls for text, debug draws, gpu timers and blits.
The cpu cost of the app right now is near to zero, if i look at the sky, i am still gpu bound at 0.8ms, 0.6ms is the draw indirect ( it should be zero but is not able to claim performance from a count buffer value smaller than the max argument, nvidia need to fix that, and AMD is just pure broken right now on ExecuteIndirect, no kidding ). Imagine you culled on the cpu 256K objects, even with the right hierarchical structure, your are way behind that.
In a real app, because of dx12 bindless, the number of real cpu unique draw calls is lower than it could have been on dx11. For a stereo render, you can imagine a lot of techniques, mine is doubling the groups in the ExecuteIndirect, and add an extra root constant to say left/right in the command signature and use that to use the proper viewprojectionviewport matrix + extra clip plane between the two fake viewports ( because VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation is false on nvidia, or it would be even simpler, just a semantic to output from the VS ).
The culling occlusion is the small red bar in the top left part of the screen. The red part in the right is the blit to backbuffer prior to text and gpu timers. And the purple just before is depth buffer pyramid for occlusion in the next frame. Most of my stuff are still rough and not optimal, and of course, you do not want to know the number of millions of triangles that are on these screenshots :)
full sized images
Only frustum culling:

With occlusion culling:

What was hidden:

the stripped grey bar show the milliseconds.
Yes, pushing on the CPU can be light speed compared to DX11 while using it in an old cpu fashion way, but the real force is to do things differently, more stream lined to the GPU.
I broke my oculus mode right now, still the screenshots are 256K objects, with per instance texture, in a couple of dispatches, one ExecuteIndirect that contains 4096 draws when no culling no occlusion is performed ( to emulate a collection of different objects, should be one ExecuteIndirect per PSO, my commands are made of one index buffer, two vertex buffers and a draw instanced ) and a few draw calls for text, debug draws, gpu timers and blits.
The cpu cost of the app right now is near to zero, if i look at the sky, i am still gpu bound at 0.8ms, 0.6ms is the draw indirect ( it should be zero but is not able to claim performance from a count buffer value smaller than the max argument, nvidia need to fix that, and AMD is just pure broken right now on ExecuteIndirect, no kidding ). Imagine you culled on the cpu 256K objects, even with the right hierarchical structure, your are way behind that.
In a real app, because of dx12 bindless, the number of real cpu unique draw calls is lower than it could have been on dx11. For a stereo render, you can imagine a lot of techniques, mine is doubling the groups in the ExecuteIndirect, and add an extra root constant to say left/right in the command signature and use that to use the proper viewprojectionviewport matrix + extra clip plane between the two fake viewports ( because VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation is false on nvidia, or it would be even simpler, just a semantic to output from the VS ).
The culling occlusion is the small red bar in the top left part of the screen. The red part in the right is the blit to backbuffer prior to text and gpu timers. And the purple just before is depth buffer pyramid for occlusion in the next frame. Most of my stuff are still rough and not optimal, and of course, you do not want to know the number of millions of triangles that are on these screenshots :)
full sized images
Only frustum culling:

With occlusion culling:

What was hidden:

the stripped grey bar show the milliseconds.
Quick Links
- Horizon Developer Support
- Quest User Forums
- Troubleshooting Forum for problems with a game or app
- Quest Support for problems with your device
Other Meta Support
Related Content
- 2 months ago
- 3 years ago