cancel
Showing results for 
Search instead for 
Did you mean: 

New docs: Thermal

JohnCarmack
Explorer
Some new docs for the upcoming SDK. Comment with questions for clarification:

The current devices are amazingly powerful for something you can stick in your pocket - four 2.5 Ghz CPU cores and a 600 MHz GPU. Fully utilized, it actually can deliver more performance than an XBOX 360 or PS3 in some cases.

Briefly.

Without a heat sink and fan to cool it, fully utilizing all the capabilities of the chip will cause it to heat up to a dangerous level in under a minute.

A governor process on the device monitors an internal temperature sensor and tries to take corrective action when the temperature rises above certain levels to prevent malfunctioning or scalding surface temperatures. The corrective action that it takes is lowering the clock rates, regardless of our minimum clock settings.

If you run hard into the limiter, the temperature will continue climbing even as the clock rates are lowered, and the CPU clocks may get dropped all the way down to 300 MHz. The device may even panic under extreme conditions. VR performance will have catastrophically dropped along the way.

The default minimum clock rates for VR applications is 1.8 GHz on two cores, and 600 MHz on the GPU. If you are consistently using most of this, you will eventually run into the thermal governor, even if it starts off running fine. This will manifest as an app that runs great when you start it, but after ten minutes of play, it starts to run poorly. If you filter logcat output for "thermal" you will see various notifications of sensor readings and actions being taken.

A critical difference between mobile and PC/console development is that no optimization is ever wasted. Without power considerations, if you have the frame ready in time, it doesn't matter if you used 90% of the available time or 10%. On mobile, every operation is draining the battery and heating the device. Of course, optimization takes effort that comes at the expense of something else, but it is important to note the tradeoff.

GPU power consumption is usually a pretty clear tradeoff -- resolution, MSAA, sRGB, chromatic aberration, uncompressed textures, and high poly models all cost power. Hitting 60 fps should not be your only concern.

In general, CPU load seems to cause more thermal problems than GPU load, but it is harder to optimize. Even if your debug build runs at full speed, you should still enable full optimizations. Standard advice about profiling applies.

An option most platforms don't have is the explicit reduction of clock rates. Two cores working at 1 GHz consume less power than doing the same work on one core at 2 GHz, even if the two GHz core is sleeping half the time. if you don't reduce the clock rates, splitting work over two cores will not save power.

Choosing the lowest clock rates that still allow your application to run well can make up to a 25% difference in power consumption for apps that don't use a large fraction of the available power. For example, VRCinema is using 880 MHz for the CPUs and 380 MHz for the GPU, allowing it to operate at a steady state without any thermal warnings for as long as the battery lasts, which is notably longer than it would be at full clock rates.

We are considering some kind of notification to the user that thermal throttling is taking place, because it means that the user experience is about to degrade significantly. There probably are some compelling applications that may not be runnable in steady state, but messaging this to users will be challenging.
22 REPLIES 22

zerogee
Honored Guest
The new SDK will have the API where you can underclock.

sfaok
Protege
Apologies if I've missed it in the new docs but is underclocking possible in Unity apps at the moment?
Developer of Ocean Rift. Follow me on Twitter @sfaok

EMcNeill
Member
"sfaok" wrote:
Apologies if I've missed it in the new docs but is underclocking possible in Unity apps at the moment?


I haven't looked into it yet myself, but the docs point to OVRModeParms.cs in the Assets/Moonlight folder for an underclocking example.

I'm hoping to be able to run my game for long periods of time, and so overheating is a concern. When I do finally experiment with underclocking, I'll post my results here. I'd appreciate it if other devs who test it out would also post what works for them.

dsmeathers
Honored Guest
OVRModeParams.cs has the comment:

"Call in Awake() before the plugin issues EnterVrMode setup"

If it has to happen in Awake(), does this mean that we can't dynamically change the clock speed?

Also, will there be an API for querying the current temperature? I know this has been asked before but I don't think it got an answer.

If (or rather, when) the device heats up so much that we can no longer deliver a good experience we're going to need to message this to the user and end the session.

dsmeathers
Honored Guest
Also, will there be an API for querying the current temperature? I know this has been asked before but I don't think it got an answer.


It's alright I found it: OVRDevice.GetBatteryTemperature() 🙂

gkennickellvr
Honored Guest
@dsmeathers - it is technically possible to change the clock speed dynamically. In order to do so though, we'd have to Leave and then re-Enter vr mode which would cause the screen to flash.

sfaok
Protege
When profiling my Unity app in LogCat I'm seeing messages like this:

07-26 17:10:04.262: I/ThermalEngine(435): ACTION: CPU - Setting CPU[0] to 2457600
07-26 17:10:04.262: I/ThermalEngine(435): ACTION: CPU - Setting CPU[1] to 2457600
07-26 17:10:04.262: I/ThermalEngine(435): ACTION: CPU - Setting CPU[2] to 2457600
07-26 17:10:04.262: I/ThermalEngine(435): ACTION: CPU - Setting CPU[3] to 2457600

With heavy thermal throttling it drops down to 883200. If it gets things under control it will try to switch all the way back up.

Have I misunderstood or should I not be seeing numbers above 1728000 while VR apps are running?

Also if I attach the OVRModeParams script to my scene with these lines active in Awake():

OVR_VrModeParms_SetCpuMhz( 880 );
OVR_VrModeParms_SetGpuMhz( 389 );

It doesn't seem to affect performance that much. LogCat is still showing the CPU bouncing around all the way up to 2.5Ghz on all four cores. Any ideas? This is the 1440p device.
Developer of Ocean Rift. Follow me on Twitter @sfaok

drash
Heroic Explorer
@sfaok, it seems to be working for me. (However, see my edit at the bottom of the post.) I set the clock speeds much lower (to the same values you did), observed a corresponding drop in performance for the duration of my app session (along with stable temps), and then checked the logs. No ThemalEngine complaints or actions taken to raise or lower clock speeds.

In the log, I did notice that the OVRModeParms.Awake() clearly took place before OVR initialization:
W/OVR_Plugin( 6452): OVR_VrModeParms_SetCpuMhz(): CpuMhz 880
W/OVR_Plugin( 6452): OVR_VrModeParms_SetGpuMhz(): GpuMhz 389
W/OVR_Plugin( 6452): OVR_TW_SetMinimumVsyncs() 1
.
.
.
W/OVR_Plugin( 6452): OVR_InitRenderThread()
W/OVR_Plugin( 6452): Calling ovr_Initialize()
W/OVR_Plugin( 6452): ovrHMD_Detect() = 1
W/OVR_Plugin( 6452): ovrHMD_Create(0) = 0x7af9fbe0
W/OVR_Plugin( 6452): Mode Parms CpuMhz 880 GpuMhz 389

So I'm wondering if your OVRModeParms.Awake() is taking place too late?

EDIT: I just ran through my app again with CPU Mhz = 1728, and once it hit 91 C, it set the CPU to.... "2265600"... Khz? If that's indeed Khz then I see your complaint -- why would it kick it up from 1.7 Ghz to 2.2 Ghz at the first sign of high temps? I tried searching around for confirmation of what the units actually are, but couldn't nail it down.
  • Titans of Space PLUS for Quest is now available on DrashVR.com

sfaok
Protege
@drash OVRModeParms.Awake() takes place before OVR Init in mine too.

I do see a modest drop in performance I think:

No underclocking - 59-60fps
CpuMhz(880) GpuMhz(389)- 56-60fps

However shouldn't the FPS be lower than this at these "power save" frequencies? I'm still hitting temps of 90000mC and throttling 15 minutes in, with the GPU in the 3ms-6ms range. If my app can still reach 56-60fps at full underclock, why is it overheating so quick? Or is this normal?

(edited with more recent findings)

Another thing is that the thermal throttling steps up and down through the frequencies :


ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 2265600
ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 1958400
ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 1728000
ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 1574400
ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 1497600
ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 1574400
ThermalEngine(430): ACTION: CPU - Setting CPU[1] to 1728000
ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 1958400
ThermalEngine(430): ACTION: CPU - Setting CPU[0] to 2265600


This is the first occurrence of throttling from a CpuMhz(880) GpuMhz(389) run - if I set the initial frequency to 883200 what is it doing starting underclocks from 2265600? :?
Developer of Ocean Rift. Follow me on Twitter @sfaok

johnc
Honored Guest
The values that we set are the locked MINIMUM clock rates. The system can still choose to ramp them up under some opaque internal algorithm, so turning them down won't necessarily make your app run slower at a steady state. It will start out slower, but it may ramp back up to nearly the same values.

To a first approximation, lowering these values only optimizes an application that doesn't use all the available power, rather than forcing applications to consume less resources.

If you look in App.cpp CreateSchedulingReport(), there is code that reads what the current clocks actually are, rather than what you have requested. We should probably add a trivial form of this to the once-a-second fps log report.

If the CPU clocks are going up that high, the problems are not GPU related, so dropping the GPU clocks more likely won't help. If you don't know of any ways to make a big difference in your app's CPU efficiency, the only big hammers at your disposal are monoscopic rendering or 30 hz rendering with minimumVsyncs 2.

We do intend to have a visible popup icon when the clocks are pushed down below the minimums so users know that they should probably quit soon, since performance will be erratic. While we would like all apps to be able to run steady state, there may be some justifiable cases where it just isn't possible.