Forum Discussion
lazydodo
13 years agoHonored Guest
Shader Optimization.
I was playing with the shader and wondered if it could be optimized.
Lets start with the standard Shader
That's 11 Multiplies, 6 Additions and 3 Subtractions per pixel. (eyeballed the numbers sorry, could be off by one or two) and on my GTX670 takes 0.1447264 milliseconds to execute (based on averaging the time of a 100 frames of just the draw with the shader)
Which could be good, could be bad, nothing to compare it to, so lets compare it to a shader that does not warp and just samples the texture.
which takes 0.0272857 milliseconds to execute (again GTX670, 100 frames)
So the warping shader is 5.3 times slower than just bare sampling.
There must be a better way, why are we recalculating the distortion map every frame single frame? seems rather wasteful. Lets render the distortion map to texture instead. We'll use the Red channel for the X distortion, and the Green channel for the Y distortion. If we make a texture of format R32G32_Float it should fit our needs nicely. And the upside is we only have to do the nasty math once!
our shader now looks like this
The resulting texture will look kinda like this (Used bad parameters, had to guess since I don't have a rift yet so the horizontal mapping is somewhat off)

Sweet, almost in business! lets update the warping shader to make use of the new texture. First of all the linear sampler has to go, a MinMagMipPoint sampler will fit our needs better.
Ahh much simpler now! New execution time 0.0598547 milliseconds and no visual difference! 2.5 times speed up from the original shader! Not bad for 20 minutes of tinkering.
Lets start with the standard Shader
float2 HmdWarp(float2 in01)
{
float2 theta = (in01 - LensCenter) * ScaleIn; // Scales to [-1, 1]
float rSq= theta.x * theta.x + theta.y * theta.y;
float2 theta1 = theta * (HmdWarpParam.x + HmdWarpParam.y * rSq +
HmdWarpParam.z * rSq * rSq + HmdWarpParam.w * rSq * rSq * rSq);
return LensCenter + Scale * theta1;
}
float4 main( float2 texCoord : TEXCOORD0,float4 color : COLOR0 ) : SV_Target
{
float2 tc = HmdWarp(texCoord);
if (any(clamp(tc, ScreenCenter-float2(0.25,0.5), ScreenCenter+float2(0.25, 0.5)) - tc))
return 0;
return Texture.Sample(Linear, tc);
};
That's 11 Multiplies, 6 Additions and 3 Subtractions per pixel. (eyeballed the numbers sorry, could be off by one or two) and on my GTX670 takes 0.1447264 milliseconds to execute (based on averaging the time of a 100 frames of just the draw with the shader)
Which could be good, could be bad, nothing to compare it to, so lets compare it to a shader that does not warp and just samples the texture.
float4 pixelShader( float2 texCoord : TEXCOORD0,float4 color : COLOR0 ) : SV_Target
{
float4 c = Texture.Sample(Linear, texCoord);
return c;
}
which takes 0.0272857 milliseconds to execute (again GTX670, 100 frames)
So the warping shader is 5.3 times slower than just bare sampling.
There must be a better way, why are we recalculating the distortion map every frame single frame? seems rather wasteful. Lets render the distortion map to texture instead. We'll use the Red channel for the X distortion, and the Green channel for the Y distortion. If we make a texture of format R32G32_Float it should fit our needs nicely. And the upside is we only have to do the nasty math once!
our shader now looks like this
float2 HmdWarp(float2 in01)
{
/// Same as above, cut to keep the post size down.
}
float4 main( float2 texCoord : TEXCOORD0,float4 color : COLOR0 ) : SV_Target
{
float2 tc = HmdWarp(texCoord);
if (any(clamp(tc, ScreenCenter-float2(0.25,0.5), ScreenCenter+float2(0.25, 0.5)) - tc))
return 0;
return float4(tc.x, tc.y,0,0);
};
The resulting texture will look kinda like this (Used bad parameters, had to guess since I don't have a rift yet so the horizontal mapping is somewhat off)

Sweet, almost in business! lets update the warping shader to make use of the new texture. First of all the linear sampler has to go, a MinMagMipPoint sampler will fit our needs better.
SamplerState PointSampler : register(s0);
Texture2D Texture : register(t0);
Texture2D TextureWarp : register(t1);
float4 main( float2 texCoord : TEXCOORD0,float4 color : COLOR0 ) : SV_Target
{
float4 c = TextureWarp.Sample(PointSampler, texCoord);
float2 RealCoord = float2(c.r,c.g);
return Texture.Sample(PointSampler, RealCoord);
};
Ahh much simpler now! New execution time 0.0598547 milliseconds and no visual difference! 2.5 times speed up from the original shader! Not bad for 20 minutes of tinkering.
80 Replies
- ralfalHonored GuestI experimented with this while building a DIY-prototype some time ago. For the various lenses I tested at least the r^6 term could be safely omitted. Even using only r^2 was 'good enough', while not perfect of course.
Due to a smaller lens diameter the Rift apparently has more distortion than the lenses I used though, so the sdk-equation most likely makes sense for its lenses. But I don't think that going higher than r^6 would yield any substantial improvement. - KuraIthysHonored GuestI guess an optimisation like this depends on the hardware.
Calculating the distortion directly requires more powerful shader hardware, while using a lookup requires more memory bandwidth.
Memory bandwidth has tended to be where the most corners get cut for low-end hardware, but I guess that still might not mean using shader calculations is faster.
( I have some old hardware which, when running most games is clearly limited by the memory bandwidth, since overclocking the memory produced huge performance improvements, while overclocking the shader cores did very little. - Even so I'm not convinced the lookup table method would be slower even on that system... But I suppose I could check... XD) - lazydodoHonored GuestThats why i also ran the test on my Geforce 210, cheapest of the cheap you can find ($35 at your local pc hardware place) even there the speedup was good, there's really no reason not to do this.
- renderingpipeliHonored Guest
"lazydodo" wrote:
I was playing with the shader and wondered if it could be optimized.
If we make a texture of format R32G32_Float it should fit our needs nicely.
I guess half-float would be all you need for the texture coordinates (so a R16G16_Float texture or what this would be called in DX ;-) - this would cut the bandwidth costs in half. Anyone tried that? Or played around with lower resolution look-up textures (like half screen-res)? - lazydodoHonored Guest
"renderingpipeline" wrote:
"lazydodo" wrote:
I was playing with the shader and wondered if it could be optimized.
If we make a texture of format R32G32_Float it should fit our needs nicely.
I guess half-float would be all you need for the texture coordinates (so a R16G16_Float texture or what this would be called in DX ;-) - this would cut the bandwidth costs in half. Anyone tried that? Or played around with lower resolution look-up textures (like half screen-res)?
Using R16G16_Float gets it down to 1.1181888 milliseconds on the 280 (down from 1.597678 with R32G32) - lazydodoHonored GuestRan it on the 670, R16G16_Float got it down to 0.0379216 ms a frame. to recap on the 670
Original Shader: 0.1447264
Just Sample: 0.0272857
LookupR32G32 : 0.0598547 (2.4 times faster than the original shader)
LookupR16G16: 0.0379216 (3.8 times faster than the original shader) - heshamProtegeGreat suggestion guys. I've updated my project to do the caching on the Mac version because it was running really slow and performance doubled from around 30fps to 60fps :) I'm running on a mid-2011 MacBook Air. The source is on bitbucket in case anyone wants to see it https://bitbucket.org/druidsbane/ibex and the main project site is http://hwahba.com/ibex. I know I need to test the Windows version on more machines as many are saying it has issues, but hopefully the Mac version which you can also download from the above site should work better as the systems are more uniform. Still waiting on the Rift Mac SDK and really hope this OpenGL stuff gets included at some point so we don't all have to keep rolling our own solutions!
- geekmasterProtege
"gallantpigeon" wrote:
Correct me if I am wrong, but doesn't the oculus sdk use a Maclaurin series expansion to approximate a full barrel distortion? Wikipedia describes the full expansion: http://en.wikipedia.org/wiki/Distortion_(optics).
Assuming this is correct, we can use more than the 6 terms the sdk uses, r0 = r(k0 + k1r^2 + k2r^4 + k3r^6), to get a higher distortion accuracy since the math operations are only done at start up. I'm not sure if adding more terms will make any noticeable difference in quality, the error may be negligible.
According to the literature on this subject, it is rarely beneficial to use more than two terms for radial distortion correction. However, the formula you show above is only a "degenerate" form of the full "Brown's model" lens distortion correction formula, leaving out the very important tangential distortion correction that is very important for viewing through on off-axis (offset) position in the lenses (caused my non-adjustable lens IPD in the RiftDK). Because only the radial correction subset of the full formula is centered on zero, the formula is a Taylor's series and not a Maclaurin series.
I provided more details about lens distortion correction, and more reference links, in this post:
viewtopic.php?f=20&t=32#p3962
And the extra correction terms in the full formula do not require additional run-time processing power. In fact, if you pre-calculate and use a displacement map shader instead, a full 2.4x to 3.8x speed INCREASE has been reported:
viewtopic.php?f=20&t=353
And using a displacement map means that we can add tangential correction (for people with IPD significantly different from the RiftDK "standard" 64mm IPD), at NO ADDITIONAL COST over the "radial only" example used in that link. - boone188Honored GuestNice work!
- brantlewAdventurerJust shootin' from the hip here, but the current warp correction has k[0]==1 which seems to be a constant via the definition of Brown's distortion formula and also k[3]==0 which is "practically" a constant for non-fish eye lens. So the general shader could have those terms simplified out for an easy reduction in instructions.
Quick Links
- Horizon Developer Support
- Quest User Forums
- Troubleshooting Forum for problems with a game or app
- Quest Support for problems with your device
Other Meta Support
Related Content
- 1 year ago