-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated EXT_shader_framebuffer_fetch proposal #3144
Updated EXT_shader_framebuffer_fetch proposal #3144
Conversation
…shader_framebuffer_fetch
update from Khronos fork
…shader_framebuffer_fetch # Conflicts: # extensions/proposals/EXT_shader_framebuffer_fetch/extension.xml
After much discussion in the WebGL working group we think it's too late in WebGL's lifecycle to add this mobile-specific optimization. We will consider adding EXT_shader_pixel_local_storage in #3385, but suggest to work with the WebGPU community group to specify this kind of functionality for the WebGPU API instead. |
Seems odd to add EXT_shader_pixel_local_storage but never add EXT_shader_framebuffer_fetch. There is atleast one desktop GPU that supports EXT_shader_framebuffer_fetch (but not EXT_shader_pixel_local_storage). |
Our colleague points out that EXT_shader_pixel_local_storage can be emulated on top of EXT_shader_framebuffer_fetch. We will investigate this possibility. |
Depends; I can't remember what the interaction with MSAA is (for EXT_framebuffer_fetch it essentially forces the fragment shader to run per-sample) but I don't know what EXT_shader_pixel_local_storage's interaction with MSAA is (I think the extension disallows MSAA). Also, if I remember correctly, EXT_shader_pixel_local_storage's disables blending where as EXT_framebuffer_fetch is on top of it... |
Is there any chance of reviving this proposal? We'd very much like to discard fragments once framebuffer alpha accumulation saturates while rendering sorted gaussian splat scenes. Without this extension we are forced to perform mid-scene copy of frame buffer with stalls and suboptimal discard. Also, while we are very keen on webgpu (and have been working on support for years), webgl 2.0 will be with us for many years to come and so an optimisation like this will be very useful. |
@slimbuck please join https://groups.google.com/g/webgl-dev-list and publish an example showing why https://registry.khronos.org/webgl/extensions/WEBGL_shader_pixel_local_storage/ can't satisfy the requirements. Chris Dalton from Rive did extensive analysis of the available options, and he and the working group are convinced that proper use of the PLS extension can be hosted on top of EXT_shader_framebuffer_fetch while providing better portability of application code. |
I will keep an eye on support for the above though, thanks! |
Systems that natively support |
@slimbuck you can already start developing with |
This PR contains a update to the EXT_shader_framebuffer_fetch WebGL extension proposal - it now includes details on how The OpenGL ES Shading Language version used affects usage of the extension and examples for both version 1.00 and 3.00 - the samples show a desaturate effect - which would be much more expensive without the extension - especially on bandwidth limited mobile GPUs.
Below is some additional information and demo, that I include after a suggestion by @kenrussell in the WebGL mailing list that this could help motivate the extension being moved forward.
Motivation
This extension proposal was made in order to enable a fast path to be written in PlayCanvas for GPUs that support this extension. The vast majority of GPUs in mobile devices being used today, released in the last 5-6 years, support this extension and it was also pointed out by @krogovin in the mailing list, that Intel integrated GPUs have supported this extension since Gen9 (Skylake) - which was released 5 years ago.
Adding a fast path would result in increased performance on mobile GPUs allowing developers to raise quality of WebGL apps or games that are made to run on mobile and desktop - and has been done in renderers used by native mobile apps and games - including browsers like Chrome. The reality is that this fast path enabled by this extension is actually a way to overcome weaknesses/limitations in the tile-based GPU architecture - and that there is no significant penalty for non-tile based GPUs to not take the "fast" path.
Demo and Use Cases
Here is a link to the demo: https://raytranuk.github.io/tests/ext_demo/extension_proposal_demo.html
The demo was put together using PlayCanvas Engine examples. The demo shows 3 common use cases that are often found in 3D games and apps:
1 - Post process effects
The demo renders a full screen post process pass that takes current frame buffer, desaturates it and highlights a box by making glow red. With the EXT_shader_frambuffer_fetch extension, tile-based GPUs can avoid expensive resolves to memory when rendering this effect - and this can be as much as 40% faster.
Related: Below are slides from: https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-20-66/5_2D00_mmg2020_2D00_filament_2D00_romain.pdf - that were was presented in SIGGRAPH 2020 Moving Mobile Graphics course*:
*NB I also did a talk in the Moving Mobile Graphics course in SIGGRAPH 2016
These slides show that some commonly used post processes, such as Tone mapping, Color grading, (and Desaturate that I included as the code sample in this PR for the proposal) that introduce performance problems on mobile GPUs, can use the extension in OpenGLES to avoid performance pitfalls - it also shows that this extension is in some ways equivalent to Vulkan's sub passes - which was introduced specifically to allow for improved tile-based GPU performance.
In terms of implementation in PlayCanvas, currently, on most mobile apps/games, Tone mapping is added to the end of all material shaders - this is not ideal because it is very inflexible and precludes a lot of potentially useful lighting techniques (that themselves would benefit from the extension BTW) and not efficient if there is any overdraw in the scene. I would hope to open up the ability for PlayCanvas to efficiently use post processes and other full screen passes that work within the constraints of (or can be significantly accelerated by) the EXT_shader_frambuffer_fetch extension in apps and games that target mobile browsers.
2 - Particle visual effects
https://medium.com/pocket-gems/programmable-blending-on-ios-and-android-46bd8534076e - this article highlights a very common problem with fixed function blending that many VFX artist have come across - this same problem has come up on game projects I have worked on*.
*NB https://www.mobygames.com/developer/sheet/view/developerId,43365/
In the demo, I render the particles twice - once with normal blending and once with additive blending - similar to what is described in this link in the Unity3D game engine: https://forum.unity.com/threads/how-do-i-get-my-additive-particle-effects-to-look-consistent-between-light-and-dark-backgrounds.819141/
In terms of implementation in PlayCanvas, I would expose the option to use an improved VFX blending - that would run in a single pass if the EXT_shader_frambuffer_fetch extension is present or 2 passes without - therefore for this (quite common in my experience working with VFX artists) use case, the gain would be about 2 times faster for the same quality.
3 - Text rendering effects
I first realized the importance of the EXT_shader_frambuffer_fetch extension when trying to find an efficient way to implement a new feature in PlayCanvas that PlayCanvas developers were asking for: thicker text outline effect - similar to Photoshop's text Stoke effect - which I believe is used quite widely by artist and graphic designers:

In the demo, in order to correctly render translucent anti-aliased text glyphs with drop shadow and thicker outline effects that overlap neighboring glyphs, it required 5 passes. With the EXT_shader_frambuffer_fetch extension it could be done in 2 passes - therefore the expected performance gain is 2.5 times.
Web Browser Renderers
Another example of significant performance gains using the EXT_shader_framebuffer_fetch extension was in the Chrome Compositor on Android mobile devices - which showed an significant 28% increase in performance over not using the extension:
https://bugs.chromium.org/p/chromium/issues/detail?id=436481 - further reading suggests Chrome switched to using the Skia renderer - and a quick search shows the Skia renderer also uses the EXT_shader_framebuffer_fetch extension https://github.com/google/skia/blob/master/src/gpu/gl/GrGLCaps.cpp#L833 to create a fast path for rendering on GPUs that support the EXT_shader_framebuffer_fetch extension.
Conclusion
WebGL developers currently lack the EXT_shader_framebuffer_fetch extension that OpenGL ES mobile app and game developers use to optimize the performance of their mobile apps and games - this puts WebGL developers at a significant relative disadvantage in terms of being able to release good quality web games and web applications on mobile browsers - and arguably holds them back on desktop browsers too - as I believe the majority of web developers would like (or require that) their apps/games work well in mobile as well as desktop browsers. It has also been shown that the renderers used in the web browsers themselves use the EXT_shader_framebuffer_fetch extension, creating fast paths for tile based GPUs - with significant performance gains.
I hope this PR and demo will help the WebGL community understand why the EXT_shader_framebuffer_fetch extension is important, and support it's addition to WebGL in the near future.