Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write / reading frag_color multiple times in shader seems to cause performance problem #84526

Open
lawnjelly opened this issue Nov 6, 2023 · 8 comments

Comments

@lawnjelly
Copy link
Member

lawnjelly commented Nov 6, 2023

Godot version

3.6 beta 3 (but likely existed for years)

System information

Linux Mint 21.1, Intel Core i7-13700T, Intel a780 GPU

Issue description

After spending some time tracking down a performance problem when working on core shaders in the 3D platformer demo, I tracked down a large drop in fill rate performance to modifying frag_color multiple times in the fragment shader.

I'm half expecting this to be an error on my part but I'd rather flag it and feel silly 😊 than let it pass and potentially miss a good speedup.

GLES2

Adding the second line here:

gl_FragColor = vec4(ambient_light + diffuse_light + specular_light, alpha);
gl_FragColor *= 1.0001;

dropped the frame rate from 127fps to 62fps in GLES2.

Whereas using this instead:

vec4 temp_frag_color = vec4(ambient_light + diffuse_light + specular_light, alpha);
temp_frag_color *= 1.0001;
gl_FragColor = temp_frag_color;

kept the frame rate at 127fps.

This was very surprising to me, and I was expecting it to be an artifact, but it seems very repeatable on my hardware. It may come down to the drivers, but if it happens for me it likely happens on other setups.

This paradigm of writing to frag_color multiple times is used in GLES2 and GLES3 (for at least fog and emission), and may be dropping performance unnecessarily. It may also be a problem in 4.x (haven't examined but I've mentioned this in rocket chat).

GLES3

I also tried the same in GLES3.

Again, adding the second line here:

	frag_color = vec4(ambient_light + diffuse_light + specular_light, alpha);
	frag_color.rgb *= 1.0001;

dropped performance from 88fps to 30fps. That's almost a 3 fold drop in performance.

GLES3 seems less susceptible currently, as it only modifies frag_color when adding emission:

#ifdef USE_FORWARD_LIGHTING //ubershader-runtime
	frag_color.rgb += emission;
#endif //ubershader-runtime

Steps to reproduce

See above.

Minimal reproduction project

3D Platformer demo, modify scene.glsl shaders as above, and recompile engine.

Discussion

I'm not a shader guru by any means. I had a vague memory of this as a possible thing to watch for, but I'm not up to date. I haven't previously touched this part of the shaders, I just noticed while running experiments with blob shadows.

It could be that modifying the existing frag_color is causing some kind of round trip, or preventing this being optimized away into a local register, whereas using a temporary variable prevents side effects so doesn't have the performance drop. Alternatively it could be a problem introduced by our own shader "translator" code (I haven't examined the final glsl).

I also considered the problem was due to modification of alpha, but using a line like gl_FragColor.rgb *= 0.999; also creates the problem.

I'm still not absolutely sure it isn't some artifact I've created somehow - it would be nice to see it independently verified. It may not occur on all hardware / drivers.

It is possible this could also occur in user shaders.

If confirmed perhaps we can take some (hopefully) simple steps to eliminate this problem, by standardizing our shaders to use temporaries, and only write to the final GL builtin once, at the end.

Update

My small PR for GLES2 would seem to confirm this is a valid issue, it gives 2-3x increase in fps in scenes which tax fill rate and use emission / fog. 👍
It is likely that the 3D platformer is taxing on fill rate perhaps because of reflection probes. Will try this in TPS demo too.

@Calinou
Copy link
Member

Calinou commented Nov 6, 2023

Intel a780 GPU

That GPU was never released, are you mistaking it for the Arc A770 or A380?

@lawnjelly
Copy link
Member Author

lawnjelly commented Nov 6, 2023

Intel a780 GPU

That GPU was never released, are you mistaking it for the Arc A770 or A380?

That's what I read too, but my system info says "Intel Corporation Device a780" 🤣

Anyway the CPU is 13th Gen Intel© Core™ i7-13700T × 16, so it is whatever integrated GPU comes with that. 👍

EDIT: Intel® UHD Graphics 770 apparently according to google.
https://www.intel.com/content/www/us/en/products/sku/230492/intel-core-i713700t-processor-30m-cache-up-to-4-90-ghz/specifications.html

0xA780 is the Device ID it looks like.

@jknightdoeswork
Copy link

Does this come into play when doing this in user created shaders:

COLOR = vec4(1.0,0.0,0.0,1.0)
COLOR = vec4(0.0,1.0,0.0,1.0)

Does this get transpiled to multiple assigns to gl_fragColor?

Any idea on webgl implication?

Is this a super easy fix? Just change a color .glsl files?

@lawnjelly
Copy link
Member Author

Does this come into play when doing this in user created shaders:

Possibly, I'm not super familiar with the translation yet. But on the plus side it should be relatively easy to fix (write to temporary, then write gl_FragColor once at the end).

I've not tested outside my dev machine yet, but in theory if it occurs on one machine, it's likely it could occur on multiple similar setups. (Only further testing will reveal this.)

@lawnjelly lawnjelly changed the title Writing to frag_color multiple times in shader seems to cause performance problem Write / reading frag_color multiple times in shader seems to cause performance problem Nov 7, 2023
@lawnjelly
Copy link
Member Author

Further clue this morning - while testing #84529 I discovered that in GLES3 adding this second line:

frag_color_final = frag_color;
frag_color_final *= 0.999;

Does NOT result in the drop in performance. This suggests that there is some kind of interaction with the previous code, like it is breaking a fast path, but only in some circumstances depending on the previous code.

Truly a very strange bug. 😁

@lawnjelly
Copy link
Member Author

lawnjelly commented Nov 7, 2023

This is not perfect but if you run this project in 3.x it should hopefully show whether your GPU has this slowdown (in GLES2). This may not be reliable on super fast GPU, it shows approx 100fps on my integrated GPU.

If the FPS label is approx the same with fog on and off, there is no slowdown. If it is approx halved with fog on, then your GPU has the slowdown.

The reason it exposes the problem is that fog modifies an existing gl_FragColor. With the PR fixed version, there is no slowdown, but on vanilla Godot, the slowdown should be exposed.

glFragColor_test_gpu.zip

@lawnjelly
Copy link
Member Author

lawnjelly commented Nov 7, 2023

Will post results here of testing as I get it. Using the above project.

Linux Mint 21.1, Intel Core i7-13700T, Intel UHD Graphics 770 GPU

Fog on 46fps
Fog off 104fps
(approx same figures when running Godot under wine)
Very susceptible.

Linux Mint 21.2, Intel Core i3 2377M, 2nd Gen Core integrated graphics

Fog on 16fps
Fog off 60fps
Very susceptible.

Android Galaxy Tab S6 Lite

(modified material to turn on ambient occlusion and reflections to bring under 60fps)
Fog on 53fps
Fog off 54fps
Not susceptible.

@lawnjelly
Copy link
Member Author

This is fixed in 3.x with #84529 but the problem may still occur in 4.x (where my PR is now out of date #83697 ).
Changing milestone to 4.x to represent this, and let anyone else pick this up if they get the problem in 4.x.

@lawnjelly lawnjelly modified the milestones: 3.x, 4.x Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants