cancel
Showing results for 
Search instead for 
Did you mean: 

Adreno GPU and Graphics advice

johnc
Honored Guest
The Adreno GPU

The Adreno has a sizeable (<512k to >1 meg) on-chip memory that framebuffer operations are broken up into. Unlike the PowerVR or Mali tile based GPUs, the Adreno has a variable bin size based on the bytes per pixel needed for the buffers. Select the lowest bit render targets possible for maximum performance.

The position portion of vertex shaders are sometimes run twice for each vertex, once to determine which bins the drawing will happen in, and again for each bin that visible (front facing) triangles covers. The binning is done on a per-triangle basis, not a per-draw call basis, so there is no benefit to breaking up large surfaces. Since the scenes are rendered twice for the stereoscopic views, and the binning process at least doubles it again, vertex processing is more costly than you might expect. Favor small, packed vertex formats in your buffers.

Avoiding any extra copies to or from tile memory is important for performance. The VrLib framework handles this optimally, but if you are doing it yourself, make sure you invalidate color buffers before using them, and discard depth buffers before flushing the eye buffer rendering. Clears still cost some performance, so invalidates should be preferred when possible.

There is no dedicated occlusion hardware like PowerVR graphics cores have, but early Z rejection is performed, so sorting draw calls to roughly front to back order is beneficial. Vrlib implements this, sorting front to back based on the furthest corner of each surface bounding box, which will draw characters before their enclosing environments.
Texture compression has a significant performance benefit. Favor ETC2 compressed texture formats, but there is still sufficient performance to render scenes with 32 bit uncompressed textures on every surface if you really want to show off smooth gradients

GlGenerateMipmap() is fast and efficient; you should build mipmaps even for dynamic textures (and of course for static textures). Unfortunately, on Android many dynamic surfaces (video, camera, UI, etc) come in as SurfaceTextures / samplerExternalOES, which don't have mip levels at all. Copying to another texture and generating mipmaps there is inconvenient and a notable overhead, but still worth considering.

sRGB correction is free on texture sampling, but has some cost when drawing to an sRGB framebuffer. If you have a lot of high contrast imagery, being gamma correct can reduce aliasing in the rendering. Of course, smoothing sharp contrast transitions in the source artwork can also help it.

2x MSAA runs at full speed on chip, but it still increases the number of tiles, so there is some performance cost.
4x MSAA is generally not fast enough for VR rendering unless the scene is very undemanding.


Advice for early VR titles

Be conservative on performance. Even with SCHED_FIFO, there is enough gong on outside of our control on the Android systems that performance has more of a statistical character than we would like. Some background tasks even use the GPU occasionally. Pushing right up to the limit will undoubtedly cause more frame drops, and make the experience less pleasant.

Under these performance constraints, you aren't going to pull of a graphics effect that people haven't already seen years ago on other platforms, so don't try to compete there. The magic of a VR experience is going to come from interesting things happening in well composed scenes, the graphics should largely try to just not call attention to itself.

Even if you consistently hold 60 fps, more aggressive drawing consumes more battery power, and it may often be the case that a subtle improvement in visual quality isn't worth taking 20 minutes off the battery life for a title.

Keep the rendering straightforward. Draw everything to one view, in a single pass for each mesh. Tricks with resetting the depth buffer and multiple camera layers are bad for VR, regardless of their performance issues. If the geometry doesn't work correctly all rendered into a single view (FPS hands, etc), then it is going to have perception issues in VR, and you should fix the design.

You can't handle a lot of blending for performance reasons. If you don't have free form user movement capabilities and can guarantee that the blended effects will never cover the entire screen, then you will probably be ok.

Don't use alpha tested / pixel discard transparency, the aliasing will be awful, and performance can still be problematic. Coverage from alpha can help, but designing a title that doesn't require a lot of cut out geometry is even better.

Most VR scenes should be built to work with 16 bit depth buffer resolution and 2x MSAA. If your world is mostly pre-lit to compressed textures, there will be little difference between 16 and 32 bit color buffers.

The default eye buffer resolution is 1024x1024 for 1080 displays and 1280x1280 for 1440 displays, which is a reasonably compromise -- the pixels in the center are somewhat magnified, and the pixels at the edges are somewhat minified. Slightly lowering the resolution for rasterization limited games can help, at the expense of additional blurring. Increasing the resolution to get maximum sharpness in the middle would also require mip mapping the eye buffers to avoid aliasing elsewhere, so it would have even more performance cost than just the scale.

50,000 static triangles in view is a conservative target for VR Scenes. You can certainly handle more in many cases, but everyone always pushes for a target number, so there it is.

Overhead on all mobile OpenGL ES drivers is higher than the norm on PC, to say nothing of consoles. Try to stay under 100 draw calls per eye.

If you dynamically create any buffers or textures, do them once per frame instead of once per eye. Be particularly careful about generating a new view on-demand inside eye rendering, as for shadow buffers, which can force the driver to flush and reload the entire frame buffer under some circumstances. There probably isn't adequate performance to do a good job filtering shadow buffers anyway; consider projected blobs as simplified shadows to ground moving objects.

Favor modest "scenes" instead of "open worlds". There are both theoretical and pragmatic reasons why you should, at least in the near term. The first generation of titles should be all about the low hanging fruit, not the challenges.
The best looking scenes will be uniquely textured models. You can load quite a lot of textures -- 128 megs of textures does not seem to be a problem. With global illumination baked into the textures, or data actually sampled from the real world, you can make reasonably photo realistic scenes that still run 60 fps stereo. The contrast with much lower fidelity dynamic elements may be jarring, so there are important stylistic decisions to be made.

Panoramic photos make very good, efficient backdrops for scenes. If you aren't too picky about your global illumination, allowing them to be swapped out is often nice. Full image based lighting models aren't performance-practical for entire scenes, but probably are ok for characters that can't cover the screen.
6 REPLIES 6

dsmeathers
Honored Guest
Thanks for such an informative post!

Regarding the front to back sorting that Vrlib does, I guess that doesn't apply to Unity projects? Presumably we'll still need to set the render queue for each shader or material to render near objects first?

johnc
Honored Guest
Yes, you will have to do any sorting manually in Unity. It isn't the end of the world if things aren't sorted right, but the difference between exactly right and the opposite can be 2x if the surfaces have non-trivial shaders on them.

Another important point for Unity is that you need to give up any tricks you are used to using with multiple cameras compositing on top of each other. HUDs, first person weapons, cockpits, etc, all should be positioned and drawn in the same 3D coordinate system as the rest of the world.

dsmeathers
Honored Guest
"johnc" wrote:

You can't handle a lot of blending for performance reasons. If you don't have free form user movement capabilities and can guarantee that the blended effects will never cover the entire screen, then you will probably be ok.


Hi John,

From what I've read elsewhere (specifically this: http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-TileBasedArchitectures.pdf) tiled renderers are very good at handling blending, because reading from the on chip tile buffer is so much faster than reading from the frame buffer. Why is this not the case on Adreno?

drash
Heroic Explorer
Thanks for the detailed notes John. Seeing your target of 50,000 vertices was certainly a wakeup call given that this is my first mobile project. In my project, I've had to struggle quite a bit to crunch it down into something more manageable, and not just because I started with 3 million vertices. I was originally using two camera rigs (each with a different ICD) in order to compensate for massive simultaneous differences in scale in the scene and to avoid cockpit jitter. Now it's just one camera rig + floating origin + a bunch of different attempts to get the far-away objects rendering correctly. That's quite a challenge since the player needs to be able to look at his shoulder without any clipping.

I don't have great results yet for the far away objects, but I think my next attempt will simply be to take control of the render queue for a few objects and rearrange those at runtime as needed. To that end, I'm very glad to see allenwp's post about Unity taking care of the front-to-back ordering -- I wasn't sure how I was going to deal with that on top of my other issue. Thanks allenwp!

Either way, now that it's one camera (or bust), I feel a lot better about my prospects of getting things to work right with the most recent mobile SDK. Now that the PC SDK 0.3.2 is out, the thought does cross my mind -- will the PC and mobile SDK merge at some point? If so, how soon? My project is intended to function on both (with only a few differences to optimize for mobile) and I'd hate to have to fork my code indefinitely in order to support both platforms.
  • Titans of Space PLUS for Quest is now available on DrashVR.com

Serk
Explorer
"allenwp" wrote:

Actually, that guide says the following regarding draw sorting:
"Oculus Mobile Unity Integration Guide 5.1.4.2" wrote:
Objects in the Unity opaque queue are rendered in front to back order using depth-testing to minimize overdraw. However, objects in the transparent queue are rendered in a back to front order without depth testing and are subject to overdraw.

Official documentation also seems to agree, so it looks like this sorting might actually be taken care of for us.

Is that also true for statically batched meshes?

AFAIK, when you use static batching Unity combines the individual objects into a single large mesh, and then attempts to draw that with the least amount of drawcalls. It stills culls the individual objects based on visibility (or if you disable any renderer) by rebuilding indices, but I didn't believe there were any front to back sorting going on here. Am I wrong?

What I'm doing is forcing my large occluders (walls and so on) to render first (ofsetting their shader's render queue), and minimizing the drawcalls by having them statically batched and using a texture atlas. I also have a simple occlusion culling system so that I don't render the whole map when doing this, but I assume there is always some overdraw. I believed this was worth it if it could help reduce drawcalls, but I'd like to hear some feedback from people with more knowledge in how Unity works under the hood.

johnc
Honored Guest
A few follow up points:

Tiled GPUs are better at blending than immediate GPUs given equal resources, but blending still costs performance. If everything in your world is opaque and drawn in roughly front to back order, there will only be roughly one fragment shader run for each pixel. If you have three layers of blending on top of that, each pixel gets four fragment shaders run. The actual blending may be almost free on a tiler, but the extra shading and texture fetches get you.

Rendering cockpits inside giant space scenes present some unique quality challenges due to floating point precision. The most robust solution is to leave the cockpit at the origin and transform everything else to that coordinate system, but that may be difficult to manage, depending on the engine you are using.

I actually set 50,000 triangles, which is even more limiting than 50,000 vertexes. 🙂 However, that is a conservative number that was basically dragged out of me when I really want to say "it depends". There is well over an order of magnitude difference in the triangle rate you can get depending on how your data is structured. Emitting one triangle per draw call is going to miss frames after only a few hundred triangles, but you might be able to look at a million triangle model if it was set up in a truly optimal manner, although that would almost certainly be a benchmark gimmick.

I will try to get some more architecture specific guidelines, but here are some more general ones:

Even if you had the performance to render as many triangles as you wanted, it wouldn't necessarily be the best decision from a visual quality standpoint. With only 2x MSAA, every visible silhouette edge adds some noticeable aliasing, which is much more distracting in VR than on a 2D screen. If the triangles are just being used to make curves smoother, it wouldn't be an increase in total silhouette edge length (slight decrease, actually), but geometric detail spent that way is rarely profitable. Visually interesting geometry is usually the addition of lots of silhouette edges, with the aliasing tradeoff.

Using geometry to add lots of material changes adds aliasing at the edges even when they aren't silhouettes, as well as increasing batch count. If you are going for a high poly look, try to wrap all the knurled detail with as few textures as possible. Reprojecting lots of textures into a single shrink-wrap texture over your geometry with off line tools can significantly help both your execution speed and visual quality.

Lots of tiny triangles are the worst of all worlds. The triangles don't help the visuals at all, and the GPU requirement of shading pixel quads can cause a triangle that covers less than a pixel of area to run the fragment shader 16 times! If your design really calls for seeing something in the far distance close all the way up to view-filling size, consider having multiple levels of detail. Fancy continuous LOD schemes are unlikely to be worth it, but at least pop to one or two lower LOD, possibly with some hysteresis to prevent it from oscillating at critical distances.

Minimize the total vertex size. If you have a pre-baked texture on everything, all you need are xyzst, and they can usually be all 16 bit values with proper scaling. If you need normals, store them as bytes instead of floats.

Use a good mesh optimizer. I will try to get Adreno-specific details, but anything that tries to preserve locality is going to get you most of the way there.

Make sure you have decent geometry chunks for object level culling. Small batches are definitely bad, but you can go too far the other way. If you are "inside" an environment, drawing the entire scene with one draw call won't be as efficient as if you broke it up into a handful of geometry sections that were culled as you looked away from them. Segmentation for culling is going to be different than the segmentation by material that comes naturally in the development process.