Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated EXT_shader_framebuffer_fetch proposal #3144

Closed

Conversation

raytranuk
Copy link
Contributor

@raytranuk raytranuk commented Aug 28, 2020

This PR contains a update to the EXT_shader_framebuffer_fetch WebGL extension proposal - it now includes details on how The OpenGL ES Shading Language version used affects usage of the extension and examples for both version 1.00 and 3.00 - the samples show a desaturate effect - which would be much more expensive without the extension - especially on bandwidth limited mobile GPUs.

Below is some additional information and demo, that I include after a suggestion by @kenrussell in the WebGL mailing list that this could help motivate the extension being moved forward.

Motivation

This extension proposal was made in order to enable a fast path to be written in PlayCanvas for GPUs that support this extension. The vast majority of GPUs in mobile devices being used today, released in the last 5-6 years, support this extension and it was also pointed out by @krogovin in the mailing list, that Intel integrated GPUs have supported this extension since Gen9 (Skylake) - which was released 5 years ago.

Adding a fast path would result in increased performance on mobile GPUs allowing developers to raise quality of WebGL apps or games that are made to run on mobile and desktop - and has been done in renderers used by native mobile apps and games - including browsers like Chrome. The reality is that this fast path enabled by this extension is actually a way to overcome weaknesses/limitations in the tile-based GPU architecture - and that there is no significant penalty for non-tile based GPUs to not take the "fast" path.

Demo and Use Cases

Here is a link to the demo: https://raytranuk.github.io/tests/ext_demo/extension_proposal_demo.html

The demo was put together using PlayCanvas Engine examples. The demo shows 3 common use cases that are often found in 3D games and apps:

  1. Post process effects
  2. Particle visual effects
  3. Text rendering effects

1 - Post process effects

The demo renders a full screen post process pass that takes current frame buffer, desaturates it and highlights a box by making glow red. With the EXT_shader_frambuffer_fetch extension, tile-based GPUs can avoid expensive resolves to memory when rendering this effect - and this can be as much as 40% faster.

Related: Below are slides from: https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-20-66/5_2D00_mmg2020_2D00_filament_2D00_romain.pdf - that were was presented in SIGGRAPH 2020 Moving Mobile Graphics course*:
*NB I also did a talk in the Moving Mobile Graphics course in SIGGRAPH 2016

image

image

These slides show that some commonly used post processes, such as Tone mapping, Color grading, (and Desaturate that I included as the code sample in this PR for the proposal) that introduce performance problems on mobile GPUs, can use the extension in OpenGLES to avoid performance pitfalls - it also shows that this extension is in some ways equivalent to Vulkan's sub passes - which was introduced specifically to allow for improved tile-based GPU performance.

In terms of implementation in PlayCanvas, currently, on most mobile apps/games, Tone mapping is added to the end of all material shaders - this is not ideal because it is very inflexible and precludes a lot of potentially useful lighting techniques (that themselves would benefit from the extension BTW) and not efficient if there is any overdraw in the scene. I would hope to open up the ability for PlayCanvas to efficiently use post processes and other full screen passes that work within the constraints of (or can be significantly accelerated by) the EXT_shader_frambuffer_fetch extension in apps and games that target mobile browsers.

2 - Particle visual effects

image

https://medium.com/pocket-gems/programmable-blending-on-ios-and-android-46bd8534076e - this article highlights a very common problem with fixed function blending that many VFX artist have come across - this same problem has come up on game projects I have worked on*.
*NB https://www.mobygames.com/developer/sheet/view/developerId,43365/

In the demo, I render the particles twice - once with normal blending and once with additive blending - similar to what is described in this link in the Unity3D game engine: https://forum.unity.com/threads/how-do-i-get-my-additive-particle-effects-to-look-consistent-between-light-and-dark-backgrounds.819141/

In terms of implementation in PlayCanvas, I would expose the option to use an improved VFX blending - that would run in a single pass if the EXT_shader_frambuffer_fetch extension is present or 2 passes without - therefore for this (quite common in my experience working with VFX artists) use case, the gain would be about 2 times faster for the same quality.

3 - Text rendering effects

I first realized the importance of the EXT_shader_frambuffer_fetch extension when trying to find an efficient way to implement a new feature in PlayCanvas that PlayCanvas developers were asking for: thicker text outline effect - similar to Photoshop's text Stoke effect - which I believe is used quite widely by artist and graphic designers:
image

In the demo, in order to correctly render translucent anti-aliased text glyphs with drop shadow and thicker outline effects that overlap neighboring glyphs, it required 5 passes. With the EXT_shader_frambuffer_fetch extension it could be done in 2 passes - therefore the expected performance gain is 2.5 times.

Web Browser Renderers

Another example of significant performance gains using the EXT_shader_framebuffer_fetch extension was in the Chrome Compositor on Android mobile devices - which showed an significant 28% increase in performance over not using the extension:

image

https://bugs.chromium.org/p/chromium/issues/detail?id=436481 - further reading suggests Chrome switched to using the Skia renderer - and a quick search shows the Skia renderer also uses the EXT_shader_framebuffer_fetch extension https://github.com/google/skia/blob/master/src/gpu/gl/GrGLCaps.cpp#L833 to create a fast path for rendering on GPUs that support the EXT_shader_framebuffer_fetch extension.

Conclusion

WebGL developers currently lack the EXT_shader_framebuffer_fetch extension that OpenGL ES mobile app and game developers use to optimize the performance of their mobile apps and games - this puts WebGL developers at a significant relative disadvantage in terms of being able to release good quality web games and web applications on mobile browsers - and arguably holds them back on desktop browsers too - as I believe the majority of web developers would like (or require that) their apps/games work well in mobile as well as desktop browsers. It has also been shown that the renderers used in the web browsers themselves use the EXT_shader_framebuffer_fetch extension, creating fast paths for tile based GPUs - with significant performance gains.

I hope this PR and demo will help the WebGL community understand why the EXT_shader_framebuffer_fetch extension is important, and support it's addition to WebGL in the near future.

@raytranuk raytranuk changed the title Updated EXT_shader_framebuffer_fetch proposal to include The OpenGL ES Shading Language version Updated EXT_shader_framebuffer_fetch proposal Aug 28, 2020
@raytranuk raytranuk marked this pull request as ready for review September 12, 2020 15:09
@kenrussell
Copy link
Member

After much discussion in the WebGL working group we think it's too late in WebGL's lifecycle to add this mobile-specific optimization. We will consider adding EXT_shader_pixel_local_storage in #3385, but suggest to work with the WebGPU community group to specify this kind of functionality for the WebGPU API instead.

@kenrussell kenrussell closed this Mar 22, 2022
@krogovin
Copy link
Contributor

Seems odd to add EXT_shader_pixel_local_storage but never add EXT_shader_framebuffer_fetch. There is atleast one desktop GPU that supports EXT_shader_framebuffer_fetch (but not EXT_shader_pixel_local_storage).

@kenrussell
Copy link
Member

Our colleague points out that EXT_shader_pixel_local_storage can be emulated on top of EXT_shader_framebuffer_fetch. We will investigate this possibility.

@krogovin
Copy link
Contributor

krogovin commented Mar 24, 2022

Depends; I can't remember what the interaction with MSAA is (for EXT_framebuffer_fetch it essentially forces the fragment shader to run per-sample) but I don't know what EXT_shader_pixel_local_storage's interaction with MSAA is (I think the extension disallows MSAA). Also, if I remember correctly, EXT_shader_pixel_local_storage's disables blending where as EXT_framebuffer_fetch is on top of it...

@slimbuck
Copy link

slimbuck commented Nov 6, 2023

Is there any chance of reviving this proposal?

We'd very much like to discard fragments once framebuffer alpha accumulation saturates while rendering sorted gaussian splat scenes.

Without this extension we are forced to perform mid-scene copy of frame buffer with stalls and suboptimal discard.

Also, while we are very keen on webgpu (and have been working on support for years), webgl 2.0 will be with us for many years to come and so an optimisation like this will be very useful.

@kenrussell
Copy link
Member

@slimbuck please join https://groups.google.com/g/webgl-dev-list and publish an example showing why https://registry.khronos.org/webgl/extensions/WEBGL_shader_pixel_local_storage/ can't satisfy the requirements. Chris Dalton from Rive did extensive analysis of the available options, and he and the working group are convinced that proper use of the PLS extension can be hosted on top of EXT_shader_framebuffer_fetch while providing better portability of application code.

@slimbuck
Copy link

slimbuck commented Nov 9, 2023

WEBGL_shader_pixel_local_storage looks like an overkill for our needs and rather more complicated than framebuffer_fetch. On systems without isCoherent seems like this feature will be rather useless for our use case.

I will keep an eye on support for the above though, thanks!

@lexaknyazev
Copy link
Member

Systems that natively support GL_EXT_shader_framebuffer_fetch would always support coherent pixel local storage.

@kenrussell
Copy link
Member

@slimbuck you can already start developing with WEBGL_shader_pixel_local_storage by going to about:flags in Chrome and turning on "WebGL Draft Extensions". (Don't browse the open web with this flag turned on, though.) It's implemented on basically every platform. We aim to bring this to community approved status as soon as the last couple of bugs are fixed on major platforms, and if you have a code path in your app which uses it when available, you'll simply start seeing the speedups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants