Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Godot 3.5] Async shader compilation freeze the game for a longer time than sync compilation #64528

Closed
Gawehold opened this issue Aug 17, 2022 · 9 comments

Comments

@Gawehold
Copy link

Gawehold commented Aug 17, 2022

Godot version

3.5 stable

System information

Windows 10 64-bit, GLES3, GTX1060 6GB

Issue description

Overview

I am trying to test the performance of the async compilation in Godot 3.5.
The uploaded project can instance a scene containing multiple mesh instances with unique materials (all spatial material, but enabled different features), and then display them, in order to test the time it takes to compile the shaders. It measures "the unresponsive time to instance the scene and add it to the tree (invisible at the moment)" and "the unresponsive time when the scene is set to be visible". Let's call them t1 and t2 respectively.

Test Results

In my test, I got the following result:

Duration (ms) \ Mode Sync Async Async + Cache
t1 21 10402 181
t2 3922 229 2674
t1+t2 3943 10631 2855

For the sake of fairness, I cleared the NVIDIA caches every time before running the program. max_simultaneous_compiles is set to be 2.
One thing to note is that, in "Async" and "Async + Cache" modes, it lags for a few seconds after t2. Also, the time it takes to start the program is longer in the "Async" mode.

Analysis

If the caching feature isn't used, async compilation alone doesn't make too much sense because the total unresponsive time is much longer than sync compilation. The only potential benefit would be shifting most of the unresponsive time to the earlier stage (adding the scene to the tree), but we can always use the old technique (force render the meshes in the loading screen) to achieve that with sync compilation.

While the total unresponsive time is lower when using async + cache, the lagging (low framerate) at the beginning makes it not wholly better in terms of the player experience. Also, we have to compile the shaders (i.e. run the program) once beforehand to have such a shorter unresponsive time, so the unresponsive time is still as long as in the "no cache async" mode for the first time. At least on my computer, the shaders would be cached by the NVIDIA GPU anyway (I am not sure if it is actually done by the GPU), so for the second time I run the program with sync mode, the unresponsive time is already really short (~100ms). Actually, it is shorter than both async modes with NVIDIA caches.

In that sense, async shader compilation seems to be a bit not too useful if the users can implement their own "force shader compile" function.

Side Notes

The performance (total unresponsive time) in the async modes seems to be worse if many custom shader materials are being used, and better if the async_mode of the materials is set to be async_hidden.

Suggestion

In order to make this feature more user-friendly and effective, I think there could be a global setting to set the async_mode for all materials (per-material setting can be still used to override it) so that people can easily use async_hidden for all materials for good performance. It might be also a good idea to have configurable fallback materials (both global and per-material) so that we can use some simple placeholder materials to balance the performance and visuals.

Verdict

I am not sure if I misunderstand how the async shader compilation, so I would love to discuss on it. I would also be happy if someone can explain how this Ubershader approach works for custom shader materials.

Steps to reproduce

  1. Clear your GPU cache (in Windows 10 with NVIDIA gpu, it is localed under AppData\Local\NVIDIA\GLCache)
  2. Set the shader compilation mode in the project setting you want to test
  3. Run ShaderCachingTest.tscn
  4. Press P button to measure t1
  5. Press Enter button to measure t2

Minimal reproduction project

AsyncShaderCompilationTest.zip

@Calinou
Copy link
Member

Calinou commented Aug 17, 2022

cc @RandomShaper

@RandomShaper
Copy link
Member

Loading time is expected to be higher with async compilation enabled, because it involves the compilation of the ubershader for each material, which is always synchronous. Therefore, games are expected to have at least the materials loaded at non-interactive situations (e.g., a loading screen).

In contrast, at runtime async compilation helps when the materials are rendered, and that's because the already compiled ubershaders can be used instead of the conditioned shaders that can't be compiled until your objects are displayed, when the specific render conditions are known. Those conditioned shaders are the ones asynchronously compiled.

So, your t1 is expected to be worse with async compilation. I've modified your script to add what we could call t0. The keyboard sequence would be now O, P, Enter. I haven't tested the script, but it should work with minor fixes in the worst case.

  • t0 (loading time) should be higher with async enabled.
  • t1 (instantiation) should be roughly the same.
  • t2 (first render time) should be much lower the async, just like in your original test.
extends Spatial

var scene

func _unhandled_key_input(event):
	if (event.scancode == KEY_O and event.pressed):
		print("Loading Started");
		var t = OS.get_ticks_msec();
		scene = load("res://UniqueMaterialMeshInstances.tscn")
		print(OS.get_ticks_msec() - t);

	if (event.scancode == KEY_P and event.pressed):
		print("Instancing Started");
		var t = OS.get_ticks_msec();
		meshInstances = scene.instance() as Spatial;
		meshInstances.visible = false;
		add_child(meshInstances);
		yield(VisualServer,"frame_post_draw")
		print(OS.get_ticks_msec() - t);
	
	if (event.scancode == KEY_ENTER and event.pressed):
		print("Showing Started");
		var t = OS.get_ticks_msec();
		meshInstances.visible = true;
		yield(VisualServer,"frame_post_draw")
		print(OS.get_ticks_msec() - t);

@Gawehold
Copy link
Author

Thank you for the quick response!

I am sorry that I am still not very clear on how the async compilation actually works, so I would like to clarify it first, even after reading the related pr you made. Here are some questions in my head. I appreciate it if you can help me clear the things up.

  1. Is there 1 ubershader for 1 godot shader (or material) or 1 ubershader for multiple godot shader? Originally I thought ubershader is an "ultimate shader" that includes many runtime conditions so that it can replace all the actual shaders (at least shaders created by the standard SpatialMaterial) temporarily by flipping different conditions, so we only need to compile this single ubershader to show all the materials.
  2. If there is 1 ubershader for 1 godot shader, how compiling the ubershader first can be helpful? Is it just faster to compile ubershader than the actual one for some reasons?
  3. Why can we compile the conditioned shader asynchronously but not the ubershader?
  4. How can you handle the custom shaders (from ShaderMaterials)?

Loading time is expected to be higher with async compilation enabled, because it involves the compilation of the ubershader for each material, which is always synchronous. Therefore, games are expected to have at least the materials loaded at non-interactive situations (e.g., a loading screen).

I think the problem is that, in my tests, the compilation is too slow for the ubershader(s). It takes longer than compiling the actual shader directly. If the usbershader compilation cannot be compiled asynchronously, using sync compilation and force the compilation in the loading screen seems to be a completely better solution because the "loading time" (including t1+t2) would be shorter and there won't be lagging during rendering.
But then the only benefit of using async compilation (ubershader) would be saving some efforts to implement the "force shader compilation" trick.

@RandomShaper
Copy link
Member

  1. There's one ubershader per shader, regardless it's a Shader resource or the shader created under the hood for a SpatialMaterial. The wording used sometimes may not have made that point clear.
  2. Again, pay attention to the ideas of scene loading time and scene run time. Compiling the ubershader is done when the material/shader is loaded. If you write your project so that no shader/materials are loaded during gameplay, a mesh with a not yet rendered shader can enter the viewport causing no hiccup because it will be rendered with its corresponding ubershader, which is already compiled and ready to use. On the contrary, with async compilation disabled, it's at that very point where the regular shader, conditioned for that specific case would have to be compiled, and that would happen synchronously, freezing the game until ready.
  3. If ubershaders were compiled asynchronously, it may happen that some were not ready by the time they were needed to render a frame. The only options in such a case would be to skip rendering the mesh or stop to wait then for the compilation to complete. Those would be pointless. For the first option there's already an async hidden mode for shaders that causes meshes to be not rendered if the conditioned shader is not ready yet. It would be silly to spend background compilation time in getting the ubershader ready for it, when what you want is having the relevant conditioned shader ready as soon as possible.
  4. They are handled the same as Shader and VisualShader resources. One ubershader is compiled for each of them.

I think the problem is that, in my tests, the compilation is too slow for the ubershader(s). It takes longer than compiling the actual shader directly

Again, consider loading time and gameplay time separately. The ubershader is compiled at loading time if you design your project so that shaders/materials are not loaded during user interaction. In contrast, the conditioned shaders can't be but compiled during gameplay, when gameplay is hindered. Async compilation avoids that by using the ubershaders, which have been pre-compiled and so are ready for immediate use, in the meantime.

Have you tried your project with the modified script? Could you share your results?

@Gawehold
Copy link
Author

Thank you for explaining! I think I have a better understanding on how it works now.

Have you tried your project with the modified script? Could you share your results?

I added a yield(VisualServer,"frame_post_draw") for the loading to make reflect the actual experience better.

Duration (ms) \ Mode Sync Async Async + Cache
t1 23 10034 184
t_instance 17 17 17
t2 3785 218 2778
total 3825 10269 2979

The result seems to be as expected.

Again, consider loading time and gameplay time separately. The ubershader is compiled at loading time if you design your project so that shaders/materials are not loaded during user interaction. In contrast, the conditioned shaders can't be but compiled during gameplay, when gameplay is hindered. Async compilation avoids that by using the ubershaders, which have been pre-compiled and so are ready for immediate use, in the meantime.

My point is we can avoid hiccups with sync compilation too. For example, in my game, we have a system that can search for all shaders/materials that will be potentially used in the current gameplay session, and then it renders them (at least) once to compile them. This process is done during the loading screen being shown, hence, when the shaders/materials being shown in the actual gameplay later, no hiccups will occur since they are already compiled. That’s why I used t1+t2 (total unresponsive time) to compare the sync and async compilation.

The pre-compilation works well in my game (while the system is quite tricky and complicated to implement). In my tests (both the project I provided and my game), the game freezes for a longer time when using async compilation, not to mention it also introduces the lower framerates issue (caused by background compilation for the conditioned shaders) after the ubershader compilation is finished.

By the way, now knowing the ubershader compilation timing is coupled with the resources load timing, I think it could be another disadvantage of using async compilation with ubershaders, because some users may want to separate when the resources get loaded (from the storage) and compiled. It might not be a big concern for most users though.

@RandomShaper
Copy link
Member

It's true that the good old pre-compilation trick, if you manage to do it correctly, beats asynchronous compilation. It can be seen as a tradeoff between developer time and run time. Besides, there's one idea to have something closer to manual pre-compilation but orders of magnitude simpler, in this proposal: godotengine/godot-proposals#4754.

@Gawehold
Copy link
Author

Gawehold commented Aug 18, 2022

Then would it make more sense to implement a dedicated official pre-compilation system or provide some useful APIs instead? Like a class that you can set the context (e.g. environment, lighting, etc) and then just call a compile function to pre-compile one shader under the context. I think it can provides better usability and performance than the current ubershader approach while keeping it not too difficult to use.

Nonetheless, I can still imagine the current ubershader being helpful in some cases.

Besides, there's one idea to have something closer to manual pre-compilation but orders of magnitude simpler, in this proposal: godotengine/godot-proposals#4754.

I have read this before, but I am not entirely sure what it does. I thought it was just a way to record what shaders (and the contexts) will be needed to use in the game so that they will be compiled in the loading time. Please correct me if I am wrong.
I personally don't quite prefer it, because it seems to be a bit hacky and unreliable (shaders could be missed or redundant). I think providing the shaders manually by the users is better, preferablely in the way I said above (the pre-compilation APIs).

By the way, what do you think about the suggestion on async compilation I made at the beginning?

@RandomShaper
Copy link
Member

It's a bit too soon to tell whether enough users will want to deal with additional shader compilation settings. For me at least, I need to get a bit of distance from this to gain proper perspective. The ideas don't sound bad, though. Feel free to open a proposal so you can get feedback from more people.

In regard to this issue, I believe we're ready to close it since it's not really a bug, but the way the system works.

@RandomShaper RandomShaper closed this as not planned Won't fix, can't repro, duplicate, stale Aug 18, 2022
@Gawehold
Copy link
Author

Thanks for the responses! I am glad to have the conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants