cancel
Showing results for 
Search instead for 
Did you mean: 

New docs: Thermal

JohnCarmack
Explorer
Some new docs for the upcoming SDK. Comment with questions for clarification:

The current devices are amazingly powerful for something you can stick in your pocket - four 2.5 Ghz CPU cores and a 600 MHz GPU. Fully utilized, it actually can deliver more performance than an XBOX 360 or PS3 in some cases.

Briefly.

Without a heat sink and fan to cool it, fully utilizing all the capabilities of the chip will cause it to heat up to a dangerous level in under a minute.

A governor process on the device monitors an internal temperature sensor and tries to take corrective action when the temperature rises above certain levels to prevent malfunctioning or scalding surface temperatures. The corrective action that it takes is lowering the clock rates, regardless of our minimum clock settings.

If you run hard into the limiter, the temperature will continue climbing even as the clock rates are lowered, and the CPU clocks may get dropped all the way down to 300 MHz. The device may even panic under extreme conditions. VR performance will have catastrophically dropped along the way.

The default minimum clock rates for VR applications is 1.8 GHz on two cores, and 600 MHz on the GPU. If you are consistently using most of this, you will eventually run into the thermal governor, even if it starts off running fine. This will manifest as an app that runs great when you start it, but after ten minutes of play, it starts to run poorly. If you filter logcat output for "thermal" you will see various notifications of sensor readings and actions being taken.

A critical difference between mobile and PC/console development is that no optimization is ever wasted. Without power considerations, if you have the frame ready in time, it doesn't matter if you used 90% of the available time or 10%. On mobile, every operation is draining the battery and heating the device. Of course, optimization takes effort that comes at the expense of something else, but it is important to note the tradeoff.

GPU power consumption is usually a pretty clear tradeoff -- resolution, MSAA, sRGB, chromatic aberration, uncompressed textures, and high poly models all cost power. Hitting 60 fps should not be your only concern.

In general, CPU load seems to cause more thermal problems than GPU load, but it is harder to optimize. Even if your debug build runs at full speed, you should still enable full optimizations. Standard advice about profiling applies.

An option most platforms don't have is the explicit reduction of clock rates. Two cores working at 1 GHz consume less power than doing the same work on one core at 2 GHz, even if the two GHz core is sleeping half the time. if you don't reduce the clock rates, splitting work over two cores will not save power.

Choosing the lowest clock rates that still allow your application to run well can make up to a 25% difference in power consumption for apps that don't use a large fraction of the available power. For example, VRCinema is using 880 MHz for the CPUs and 380 MHz for the GPU, allowing it to operate at a steady state without any thermal warnings for as long as the battery lasts, which is notably longer than it would be at full clock rates.

We are considering some kind of notification to the user that thermal throttling is taking place, because it means that the user experience is about to degrade significantly. There probably are some compelling applications that may not be runnable in steady state, but messaging this to users will be challenging.
22 REPLIES 22

pashustwo
Honored Guest
Thanks for the clear summary, I think this is the biggest worry for September for us. Do you have any data on how much different ambient temperatures effect heating rates? Luckily it's summer in the UK, but how safe is it to assume that if the game has an acceptable 'thermal ceiling' at 22C then it will be OK everywhere else in the world?

Also - is there an API to query the device's temperature sensors? Would be nice to be able to see what's going on while playtesting.

Cheers,
Peter

Anonymous
Not applicable
This is a little bit ... disappointing .. despite understanding all the problems of mobile technology is still quite a pain all this stuff to deal with especially and even worse when actually you find out that "the gpu could do it".

From one side you have now VR where "you want the best you can get" and also where "even the best could still be not enough" then from the other side ( what engineer designed this ???? ) you have "something that could reach dangerous levels within a minute" .. what's that for ? To give you a 10 second "mega wow intro" followed by 20 minutes of sloppy frame rate game ??

Man this is sad .. especially when the tech looks like it could work so well.

I guess we'll see a generation of games coming out with "5 mins playtime limit" where after 5 mins a message will appear saying " .. and now we interrupt the game for 5 mins for cooling and refreshing, have a cup of tea or a coffee meanwhile" ..

( sure at some point someone will also find the idea of adding "try the tea <sponsor_name> and the coffee <sponsor_name> in the middle ) ..

Right .. ok these are the limitations, I think at the very least to have some event/message/info from the thermal throttling is going to happen and possibly even also telling 'how' so one could have a "dynamically adjusting" sw that begins to chop down some resources/stuff when told in order to try to keep the most possible uniform experience if necessary.

You should also include some monitor/way to compile a program/force a specific clock rate so one can see immediately how this is going to affect the SW and try to trim things "for a minimal and maximal rate" specifically there should be - in the SDK - an EASY way to say "run this at 880Mhz" or "run this at 600 mhz" to see what happens.

JohnCarmack
Explorer
I should probably be more clear -- the "one minute" mentioned was with a special test that had multiple CPU cores running flat out, I'm not aware of anyone's games that actually start fine and have thermal problems one minute later. There are lots of people reporting problems after ten minutes, though.

Unfortunately, this is a question of physics, rather than policy, so we can't just lobby Samsung to make it go away. Conceivably, they could raise the thermal limit somewhat, but there would be a point on the bell curve of shipped devices where high temperature operation starts causing random failures, possibly even permanent ones, which would be really bad news.

On the bright side, optimizing against physics always feels more virtuous than optimizing against some arbitrary platform decision.

In your particular case, I think you have plenty of margin to keep the clocks low enough to run until the battery drains.

I'm sure some users will case-mod their HMD to add some form of active cooling, and that might even inform future Samsung development plans.

Anonymous
Not applicable
Anyway as I said it would be at least favorable to have :

1. a warning to know when this is going to happen and possibly a number telling how low/high the clock is going to go ( i.e for a thing like "if before I was at 100 and now I am at 50 I know I should cut some things in half" )

2. an easy way to test/simulate with different clock rates and see how the performance goes it would be lovely if somewhere in some .XML or whatever you could say a thing like "clock_speed = x" and see how the stuff goes.

The problem in essence is that "this is not a vr-only specific device" it's a device with a different use that can ALSO do VR.

Case mod .. yes .. I am sure you could glue to the back of a thing, probably sacrificing the camera and/or trying to drill some holes through some sort of heat sink or such .. probably getting some sort of La Forge look :mrgreen:

So yeah .. extensive "over 10 mins tests" will be done.

jmavor
Honored Guest
"JohnCarmack" wrote:
I should probably be more clear -- the "one minute" mentioned was with a special test that had multiple CPU cores running flat out, I'm not aware of anyone's games that actually start fine and have thermal problems one minute later. There are lots of people reporting problems after ten minutes, though.

Unfortunately, this is a question of physics, rather than policy, so we can't just lobby Samsung to make it go away. Conceivably, they could raise the thermal limit somewhat, but there would be a point on the bell curve of shipped devices where high temperature operation starts causing random failures, possibly even permanent ones, which would be really bad news.

On the bright side, optimizing against physics always feels more virtuous than optimizing against some arbitrary platform decision.


Part of my worry here is working through Unity where we don't necessarily have fine grained control over the rendering code. I'm assuming people at Unity are paying close attention to this? We can certainly be careful about how we use it.


I'm sure some users will case-mod their HMD to add some form of active cooling, and that might even inform future Samsung development plans.


So I assume then that the decision was already made to not have any kind of active cooling by default? It seems like somewhat of a good idea as I've wanted something to keep my head cool inside of the HMD anyway.

It seems like starting underclocked might be a smart move for some projects?

flarb
Partner
Are you saying the current API (before this new release) is underclocking the apps to 1.8 GHZ? Or is this going to be a feature of the new SDK? (And most importantly--a setting available with Unity?)
@flarb

zerogee
Honored Guest
There won't be active cooling on this device. We're researching whether this is something that makes sense for the S6 and beyond.

Thermal throttling is definitely a disappointment when it kicks in, but at the end of the day, it's a limit of the technology we have to live with. But hey, designing with constraints forces us all to be more efficient, so that we can still run at 60 FPS at lower clock speeds, which will pay dividends in the future.

flarb
Partner
So is underclocking a feature of the new API, or is this basically a situation where if we're hitting thermal throttling now, we're doomed unless we dramatically scale the app down?
@flarb

jmavor
Honored Guest
"flarb" wrote:
So is underclocking a feature of the new API, or is this basically a situation where if we're hitting thermal throttling now, we're doomed unless we dramatically scale the app down?


That sort of sounds like the situation.

In our case we are going to start developing underclocked and then potentially turn up the clock dynamically for short periods if we need to.