Forum Discussion
Devlin1991
12 years agoHonored Guest
Barrel disortion lookup texture?
I am going to test this later but considering the complexity of the shader we use to do the barrel distortion and chromatic aberration correction, is there any reason why we don't we compute the texture coords once on program start and save them to a texture and then use a single texel fetch to grab the result needed for that fragment when doing it at runtime rather than redoing the computations every frame?
10 Replies
- raidho36ExplorerThat heavily depends on target GPU and exact code. Either way, it's a single-pass and very simple.
You should implement both and have it auto-selectable based off runtime benchmarking, but if you're not going for it, having it just brute-force way is preferable to having it just texture lookup. - jhericoAdventurer
"Devlin1991" wrote:
I am going to test this later but considering the complexity of the shader we use to do the barrel distortion and chromatic aberration correction, is there any reason why we don't we compute the texture coords once on program start and save them to a texture and then use a single texel fetch to grab the result needed for that fragment when doing it at runtime rather than redoing the computations every frame?
Because it's slower than doing the calculations. The calculations themselves are very simple, involving no trigonometry, only addition and multiplications, which makes them extremely fast in a compiled shader. Using the results of one texture fetch to create coordinates for another, on the other hand, apparently wreaks havoc with typical driver pre-fetch and caching mechanisms. That's the answer when I got when I tested it on my machine and asked why the texture lookup version performed worse on stack overflow here.
Of course this may be subject to changes in drivers and hardware, so it might be worth supporting both paths in your rendering engine and testing at runtime to see if you can get performance improvements out of the lookup. - dghostHonored GuestIt's worth mentioning that on modern GPU architectures it's typically more expensive to do texture lookups than it is to calculate the texture offsets.
And given that many low end GPU's (particularly Intel integrated GPU's) are pretty memory bandwidth constrained, I'm not sure it's a good idea to exacerbate that issue. - tomfExplorerThe texture doesn't need to very large to get good precision, so it tends to stay resident in the texture cache - which means actual memory bandwidth isn't precisely the problem. The real question is the specific ratio of general math units to texture sampler units. Most GPU architectures allow this ratio to change from device to device according to the market segment, so it's very hard to predict which is going to be optimal for a particular machine, even if you know the vendor.
- geekmasterProtegePerhaps generally optimal to implement both methods (pre-calc texture lookup versus realtime calculation) and run a benchmark during startup to decide which method to use for current hardware.
- jhericoAdventurerI had been using the same scale for my scene framebuffer texture as my displacement map texture, so it was taking up quite a bit more memory than it needed to. It improved the performance, but not to the point where it's faster than direct computation.
Using a GL query object, the distortion clocks in at about 67μs per distort using direct computation. Using the displacement texture instead of computation, and using a texture size of 64x80 (one tenth the target screen size) I get about 89μs per distort. I also get weird jagged edges to my Rift output, because of edge effects of interpolating such a tiny displacement map. - dghostHonored Guest
"tomf" wrote:
The texture doesn't need to very large to get good precision, so it tends to stay resident in the texture cache - which means actual memory bandwidth isn't precisely the problem. The real question is the specific ratio of general math units to texture sampler units. Most GPU architectures allow this ratio to change from device to device according to the market segment, so it's very hard to predict which is going to be optimal for a particular machine, even if you know the vendor.
Very true, although I'm curious what kind of precision you're talking about. I was assuming it would need probably two 16bit values (32bit/pixel, either packed in RGBA8 or as an RG16F - I don't know about the precision loss from floating point though) to maintain numerical precision, but I might be quite mistaken about that.
I should probably have added that I've been approaching this from the perspective of optimizing a deferred shading engines for the worst case scenario of low end, bandwidth constrained hardware. That's what I get for posting in the middle of a deadline crunch. There could easily be a workload that is computationally constrained (but not bandwidth constrained) that would see improvements from sampling distortion from a texture instead. - jhericoAdventurer
"dghost" wrote:
"tomf" wrote:
The texture doesn't need to very large to get good precision
... I'm curious what kind of precision you're talking about.
I'm inferring here, but I believe he's talking about the delta between the exact computed distortion factor for a given pixel, and the value you get from the texture. If the texture is of lower resolution than the screen, then the rendering system will linearly interpolate between the actual texture values. But the distortion factor is an exponential function, not a linear one linear function, so if you make the texture small enough, then you start to see 'inflection points' where there's linear interpolation for N pixels, and then a sudden jump because you've crossed a pixel threshold in the displacement map and it starts linearly interpolating to the next pixel.
What you want is a steady curve like this
But what you get by using a texture is a bunch of straight lines between dots that are on that curve. The smaller the texture, the fewer the dots, and the more the obvious the artifacts are in the rendered image."dghost" wrote:
I was assuming it would need probably two 16bit values (32bit/pixel, either packed in RGBA8 or as an RG16F - I don't know about the precision loss from floating point though) to maintain numerical precision, but I might be quite mistaken about that.
Yeah, that's an orthogonal use of the term precision, though the precision of the texture format does come into play. In my tests, using just the 8 R and G bits offered by RGBA8 doesn't work well (though that was when I was using the full size displacement map, haven't tried it since), so I'm using RG16. RG16F isn't necessary, since the distortion factor is always a value less than 1. You can see the code I use here: https://github.com/OculusCommunitySDK/OculusRiftExamples/blob/master/source/common/Rift.h#L209 - dghostHonored GuestAh, my mistake. I assumed he was talking about decreasing per-pixel overhead and not resolution, particularly because decreasing resolution introduces the very predictable errors you mentioned. I sort of figured it was off the table, but perhaps not.
RE: numerical precision stuff - trying to store texture coordinates that span > 256 pixels in a single channel of an 8 bit integer storage format is going to fail for quite obvious reasons. What I was referring to is packing them so that the X value is packed into the R and G channels, and the Y value is packed into the B and A channels. This gives each part of the stored coordinates 16bits worth of storage capacity, which is typically better suited for a constrained range such as the 0-1 range that texture coordinates occupy. Float16 formats have some benefits, particularly in regards to really small or really large numbers, but run into problems given that you're typically wasting 5-6 bits of precision for the mantissa and sign bit if you are working in a fixed range. Either way, you're going to wind up with 32bits per pixel to store the distortion texture naively.
Not having tested either, I have no idea which would actually be better in this case. Since that's the naive implementation, it's worth pointing out that it's also possible to apply other techniques (e.g., DXT) that would reduce the footprint of the texture.
Regardless - given that details of GPU cache architectures tend to be somewhat rare, what I was really curious about was what constitutes (both in terms of resolution and storage format) "small enough to have good precision while typically staying cache residence".
In general, though, I'd love it if tomf could expand upon that post. I suspect that there is lots of good info there that isn't exactly in common circulation, and I always love to learn more.
Quick Links
- Horizon Developer Support
- Quest User Forums
- Troubleshooting Forum for problems with a game or app
- Quest Support for problems with your device
Other Meta Support
Related Content
- 2 months ago
- 10 months ago
- 10 months ago
- 5 months ago
- 10 months ago