Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant uniform uploads #2090

Merged
merged 6 commits into from
Apr 8, 2020

Conversation

juj
Copy link
Contributor

@juj juj commented Mar 30, 2020

Manage a cache of current uniform state to avoid reuploading state that is already on the GPU side.

In a WebGL build, this was observed to reduce the number of GL calls from ~21k/frame to about ~7.5k/frame.

@bkaradzic bkaradzic self-requested a review March 30, 2020 21:56
@juj juj force-pushed the optimize_redundant_uniform_uploads branch from 9682795 to d85b969 Compare April 6, 2020 10:44
@juj
Copy link
Contributor Author

juj commented Apr 6, 2020

If using std::unordered_map instead of the custom hash table implementation, frame times get +7.9% slower. (code size increases by +0.39% vs TinyHashMap)

If using tinystl::unordered_map instead of the custom hash table implementation, frame times are +3.3% slower. (code size increases by +0.15% vs TinyHashMap)

I could go with tinystl::unordered_map instead of TinyHashMap, though that does have a measurable perf drop. :/

@juj juj force-pushed the optimize_redundant_uniform_uploads branch from d85b969 to 7f9ebc5 Compare April 6, 2020 16:44
@bkaradzic
Copy link
Owner

You should use tinystl::unordered_map, also feel free to apply TinyHashMap optimizations to it since it's used elsewhere in bgfx, so those optimizations would add up.

@juj
Copy link
Contributor Author

juj commented Apr 7, 2020

Unfortunately it is not possible to optimize tinystl::unordered_map with same means - TinyHashMap operated on different semantics (combined insert+find, and no need for individual element removal support) that made it better for this purpose.

But dropped TinyHashMap in favor of tinystl::unordered_map to conform to BGFX data structures. How does this look now?

delete[] table;
}
// Uniform state cache for various types.
static tinystl::unordered_map<uint64_t, T> uniformCacheMap;
Copy link
Owner

@bkaradzic bkaradzic Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static tinystl::unordered_map<uint64_t, T> uniformCacheMap;

Do not make it static, make it part of GL renderer context so that it can be deallocated properly.

Also use stl::unordered_map (it's just name toggle between std/tinystl).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make some #if BGFX_GL_CONFIG_UNIFORM_CACHE so that this code path can be enabled/disabled.

delete[] table;
}
// Uniform state cache for various types.
static tinystl::unordered_map<uint64_t, T> uniformCacheMap;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make some #if BGFX_GL_CONFIG_UNIFORM_CACHE so that this code path can be enabled/disabled.

static GLuint currentProgram = 0;
// Keep a global access to the currently active uniform state cache so that
// individual GL functions can stay at global scope.
static UniformStateCache *s_currentUniformStateCache = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: s_renderGL at line 4313. You can make these functions part of RendererContextGL or you can forward declare them at this point, and implement them below RendererContextGL.

@bkaradzic bkaradzic merged commit d9d9865 into bkaradzic:master Apr 8, 2020
@bkaradzic
Copy link
Owner

I'm getting compiler error because those templated functions are templated on return value.

@juj
Copy link
Contributor Author

juj commented Apr 8, 2020

Would have been great to wait for the CI to go through to see that it works. What are the errors? Which compiler? With LLVM 6.0 and LLVM 10.0 it does build ok. Templating on return value by itself is not a problem, that will just require one to explicitly type specify the <T> when calling.

@bkaradzic
Copy link
Owner

bkaradzic commented Apr 8, 2020

../../linux64_gcc/bin/libbgfxDebug.a(glcontext_glx.o): In function `bx::ignoreC4127(bool)':
<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1294: multiple definition of `tinystl::unordered_map<unsigned long, int, bgfx::TinyStlAllocator>& bgfx::gl::UniformStateCache::getUniformCache<int>()'
../../linux64_gcc/bin/libbgfxDebug.a(renderer_gl.o):<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1294: first defined here
../../linux64_gcc/bin/libbgfxDebug.a(glcontext_glx.o): In function `tinystl::unordered_map<unsigned long, bgfx::gl::UniformStateCache::f4, bgfx::TinyStlAllocator>& bgfx::gl::UniformStateCache::getUniformCache<bgfx::gl::UniformStateCache::f4>()':
<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1295: multiple definition of `tinystl::unordered_map<unsigned long, bgfx::gl::UniformStateCache::f4, bgfx::TinyStlAllocator>& bgfx::gl::UniformStateCache::getUniformCache<bgfx::gl::UniformStateCache::f4>()'
../../linux64_gcc/bin/libbgfxDebug.a(renderer_gl.o):<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1295: first defined here
../../linux64_gcc/bin/libbgfxDebug.a(glcontext_glx.o): In function `tinystl::unordered_map<unsigned long, bgfx::gl::UniformStateCache::f3x3, bgfx::TinyStlAllocator>& bgfx::gl::UniformStateCache::getUniformCache<bgfx::gl::UniformStateCache::f3x3>()':
<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1296: multiple definition of `tinystl::unordered_map<unsigned long, bgfx::gl::UniformStateCache::f3x3, bgfx::TinyStlAllocator>& bgfx::gl::UniformStateCache::getUniformCache<bgfx::gl::UniformStateCache::f3x3>()'
../../linux64_gcc/bin/libbgfxDebug.a(renderer_gl.o):<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1296: first defined here
../../linux64_gcc/bin/libbgfxDebug.a(glcontext_glx.o): In function `tinystl::unordered_map<unsigned long, bgfx::gl::UniformStateCache::f4x4, bgfx::TinyStlAllocator>& bgfx::gl::UniformStateCache::getUniformCache<bgfx::gl::UniformStateCache::f4x4>()':
<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1297: multiple definition of `tinystl::unordered_map<unsigned long, bgfx::gl::UniformStateCache::f4x4, bgfx::TinyStlAllocator>& bgfx::gl::UniformStateCache::getUniformCache<bgfx::gl::UniformStateCache::f4x4>()'
../../linux64_gcc/bin/libbgfxDebug.a(renderer_gl.o):<...>/_github/bgfx/.build/projects/gmake-linux/../../../src/renderer_gl.h:1297: first defined here
collect2: error: ld returned 1 exit status

@bkaradzic
Copy link
Owner

bkaradzic commented Apr 8, 2020

Would have been great to wait for the CI to go through to see that it works.

I assumed you compiled it... :)

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

bkaradzic added a commit that referenced this pull request Apr 8, 2020
@bkaradzic
Copy link
Owner

Reverted it c023ac4...

Just revert it back, fix template issue, and resubmit PR, the rest looks good.

@juj
Copy link
Contributor Author

juj commented Apr 8, 2020

Ok the issue is that GCC does not treat the template specializations inline. Adding inline to the four lines is the fix, but I am unsure if GCC wants to see the template declaration in class scope be inline as well, or if it will forbid it since it is not a definition. I do not have access to GCC, so I will remove the template approach altogether since I cannot test against it.

@bkaradzic
Copy link
Owner

Actually wait. I can add inline.

bkaradzic added a commit that referenced this pull request Apr 8, 2020
@bkaradzic
Copy link
Owner

All good.

@juj
Copy link
Contributor Author

juj commented Apr 8, 2020

Perfect, thanks!

rozgo pushed a commit to VertexStudio/bgfx that referenced this pull request Aug 14, 2020
* Avoid redundant uniform uploads

* Fix placement of GL_CHECK()s and import of glUniform4f.

* Fix typo

* Migrate GL uniform cache to use tinystl to conform to BGFX data structures

* Address review

* Address review
rozgo pushed a commit to VertexStudio/bgfx that referenced this pull request Aug 14, 2020
rozgo pushed a commit to VertexStudio/bgfx that referenced this pull request Aug 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants