Performance degradation when using a loop in shader
Hello, UE4's architecture design isn't flexible enough and to extend the maximum number of dynamic light sources, we need to duplicate the code. For example, on the shaders side UE4's solution manually inlines lighting code for each available light source, which makes shader code unacceptably large even with 8 light sources. My solution is to replace that with a loop, which will traverse all given light sources data and execute the code for each. It appears to work but I've encountered a serious GPU performance degradation: the RenderDoc shows that my solution with a loop executes 20-30% more instructions in the fragment shader (for a few heavy drawcalls), and overall GPU time drops by 2-3ms.
For a reference, here's a link to the generated shader codes for one of the materials: https://file.io/LTXtmHC9lQXv (original generated glsl code and generated glsl code with a loop).
Does anybody know why loop affect performance that much? Are there any possible workarounds?