Forum Discussion
MichaelNikelsky
10 years agoHonored Guest
Poor performance with new SDK
Hi,
we have used application side rendering with the oculus in an OpenGL workstation application but have now been forced to switch to the new rendering. Granted, the API is much cleaner and nicer to use now, so in general I would be happy. However with the old SDK (0.44) we had about 80%GPU usage when running the oculus (we do have 99% GPU usage when running normal stereo at the same resolution though) and got at least about 50fps on quite complex scenes.
Now with the 0.6 SDK performance drops down to 37fps or sometimes even worse to 25fps and just gives us about 50% GPU usage, the rest of the time is spend in the Oculus Lib or driver while our application is stalled. The stall is happening inside the ovrHmd_SubmitFrame. I just did some timings and for a 5 million Triangle scene with very heavy shaders it took about 0.02 seconds between the ovrHmd_SubmitFrame calls and another 0.02 seconds inside the ovrHmd_SubmitFrame function call.
Rendertextures are already doing ping-pong (depth buffer is not, but it is not passed to the submitFrame anyways, so it shouldn´t matter).
Is there anything I need to take care of to make it work fast again?
Thanks
we have used application side rendering with the oculus in an OpenGL workstation application but have now been forced to switch to the new rendering. Granted, the API is much cleaner and nicer to use now, so in general I would be happy. However with the old SDK (0.44) we had about 80%GPU usage when running the oculus (we do have 99% GPU usage when running normal stereo at the same resolution though) and got at least about 50fps on quite complex scenes.
Now with the 0.6 SDK performance drops down to 37fps or sometimes even worse to 25fps and just gives us about 50% GPU usage, the rest of the time is spend in the Oculus Lib or driver while our application is stalled. The stall is happening inside the ovrHmd_SubmitFrame. I just did some timings and for a 5 million Triangle scene with very heavy shaders it took about 0.02 seconds between the ovrHmd_SubmitFrame calls and another 0.02 seconds inside the ovrHmd_SubmitFrame function call.
Rendertextures are already doing ping-pong (depth buffer is not, but it is not passed to the submitFrame anyways, so it shouldn´t matter).
Is there anything I need to take care of to make it work fast again?
Thanks
17 Replies
- opampProtege
"cybereality" wrote:
Coming shortly, but I'm not sure if that issue is fixed.
Do you know if the issue with not being able to disable vsync for testing purposes will be fixed in this release? - MichaelNikelskyHonored GuestDid some more testing and there is really something seriously wrong with the OpenGL part in the oculus lib or driver.
I disabled all our shaders and did just a simple shader rendering in stereo doing the same resolution that is used when running the oculus, so it is basically the same code except for the submitFrame. This ran at about 110 to 120fps on my machine, 99% GPU usage, so well beyond the required 75fps.
Calling submitFrame and the framerate dropped to 38fps with merely 39% GPU usage.
Profiling in nSight listed wglDXUnlockObjectsNV and wglDXLockObjectsNV as the most expensive functions on average, but each call takes about 0.2 Milliseconds, so it is probably not the performance issue itself. However the wglDXLockObjectsNV is the last command that is called from our application before it goes to an 11 millisecond (!!!) pause, waiting for whatever is called somewhere else (driver or lib). This pretty much leaves us less than 2.5 milliseconds to render our scene, which is pretty much impossible to achieve with anything but the most simplistic scenes.
It actually looks like the wglDXLockObjectsNV waits for the lib/driver to finish all its work before it can continue becoming a serious bottleneck.
EDIT: I am able to improve things a littlebit by calling glFlush directly after each eye was rendered and calling glFinih just before submitFrame is called. This moves the stall to our application and reduces the time lost due to the lock but it really isn´t pretty in my opinion and it still lets us loose lots of the computational power the GPU has. - jhericoAdventurer
"MichaelNikelsky" wrote:
However the wglDXLockObjectsNV is the last command that is called from our application before it goes to an 11 millisecond (!!!) pause, waiting for whatever is called somewhere else (driver or lib). This pretty much leaves us less than 2.5 milliseconds to render our scene, which is pretty much impossible to achieve with anything but the most simplistic scenes.
You can't really make any judgement based on that wait. The submit frame call does a bit of work and then explicitly waits until just before the next v-sync so that it can get the latest head pose to inject into timewarp. So I bet if you put an explicit 5 millisecond delay into your app right before the submit, the pause in the submit frame call would correspondingly drop by 5 milliseconds."MichaelNikelsky" wrote:
I disabled all our shaders and did just a simple shader rendering in stereo doing the same resolution that is used when running the oculus, so it is basically the same code except for the submitFrame. This ran at about 110 to 120fps on my machine, 99% GPU usage, so well beyond the required 75fps.
Calling submitFrame and the framerate dropped to 38fps with merely 39% GPU usage.
Similarly, since submit frame is essentially injecting a hole into your GPU command stream and limiting your rendering to 75 Hz, you'd expect to see the GPU usage drop compared to if you're doing rendering without submitting and without v-sync, though that drop seems larger than you'd expect going from 120 fps to 75fps (this could be attributable in part to lost efficiency because of additional GPU/CPU sync points, but I'm not sure if that could account for the whole thing).
Are you using a mirror texture to display the results back onto an on-screen window? - MichaelNikelskyHonored GuestWell, true, the time inside submitFrame would be reduced if I wait before. But V-Sync is not the issue here, V-Sync does only limit your applications performance up to the sync point. So if our application can easily render 120fps, it will limit to 75fps and not lower. I tried and confirmed this for our application, just using V-Sync is fine and sticks to 75fps.
The real issue here is that there is a syncpoint forced in some way. In OpenGL this would be done by calling glFinish, which basically tells your GPU to complete all the commands it has received so far before it can continue doing anything. This effectively ruins any prerender frames, so instead of doing double buffering you are basically doing single buffering. More so, you pretty much disables the multithreading capability of the GPU driver since nothing can run in parallel anymore. So doing all the ping-pong stuff with your renderbuffer becomes completely useless since you are waiting for everything to be completed anyways before you continue with rendering the next frame. And this pretty much seems to be what is happening here.
My first thought was the mirrortexture as well but completely disabling it didn´t make any difference. But I am not sure what it returns internally anyways. In my opinion, if implemented in a useable way, it should not try to mirror the current frame but the frame before, otherwise there would indeed be the syncproblem again. - jhericoAdventurer
"MichaelNikelsky" wrote:
In OpenGL this would be done by calling glFinish, which basically tells your GPU to complete all the commands it has received so far before it can continue doing anything.
glFinish forces a sync point between the GPU and CPU. However, if all you're concerned about is ensuring that previous operations have completed writing to a texture in context A before you start reading from it in context B then a better option is glFenceSync/glWaitSync. These return immediately on the CPU side, but inside the GPU driver force it to understand that commands issued from a context are dependent on the commands executed prior to the creation of the sync point.
It's possible that the GL/DX extensions also do something similar internally during their lock/unlock step, but I don't work for nVidia or AMD so I don't know how they manage that. I should probably dig out apitrace and locate ALL of the calls happening inside SubmitFrame. - MichaelNikelskyHonored GuestJust tried the new beta SDK and it works much better now. Still not getting 100% GPU usage but at least it is now something around 80% instead of less than 50%.
- ConstellationAdventurerUnfortunately in my case I didn't see any noticeable improvement after upgrading to 0.6.0.1. I'm working on a GUI to setup the performance HUD and I'll see what I can get out of it once it's up & running.
I looked through the source of the runtime and I found a very long comment in OculusSDK\LibOVR\Src\CAPI\D3D1X\CAPI_D3D11_CliCompositorClient.cpp beginning on line 746 that explains the GL locking & unlocking process in detail. At the bottom the comment ends with a warning. I'm not sure yet how or when this issue might arise but the all caps definitely caught my attention:
// VERY IMPORTANT THING. This assumes the state in CompositorLayers
// is complete and canonical. That is, there's no implicit state pending from
// previous frames on the server side (otherwise we'll Lock a texture that is
// still going to be drawn on the screen). We used to allow sparse data on the
// client side, but that will break everything, so not any more.
Quick Links
- Horizon Developer Support
- Quest User Forums
- Troubleshooting Forum for problems with a game or app
- Quest Support for problems with your device
Other Meta Support
Related Content
- 1 month ago
- 16 days ago
- 4 years ago