Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential fix for the synchronization issue. Needs review. #3303

Closed

Conversation

mcourteaux
Copy link
Contributor

@mcourteaux mcourteaux commented Jun 4, 2024

In order to help debugging, I created a new macro VK_TRACE similar to VK_CHECK to also be able to trace and warn for Vulkan calls that have a result which is assigned to a variable and further used. That causes a lot of lines to be changed, but the real change is not that big.

@pezcode, can you share your thoughts on this proposed fix?

So, in summary, the proposed fix will vkWaitForFences on the rendering-done fence from the oldest in-flight frame. I believe that this should cause vkAcquireNextImageKHR to always simply return the next frame in the swapchain deterministically.

In case we don't get the next frame, it is detected, as the fence for the rendering work for the frame-in-flight is stored in the swapchain m_renderDoneFence (similar to the semaphore; always the next one by index) AND in the m_backBufferFence (indexed by the frame number returned by the vkAcquireNextImageKHR). If the fence we just waited for, doesn't match the fence for that particular frame, we wait for that one too (which I genuinely think doesn't do anything reasonable as that should never happen and we are just fully screwed in that case) and print a warning (if in debug).

Would fix #3302.

@mcourteaux mcourteaux marked this pull request as ready for review June 4, 2024 20:09
@mcourteaux
Copy link
Contributor Author

Thought: I believe there might still be a bug regarding the fact that the small buffers which are reused for uniform data staging are flushed by vkFlushMappedMemoryRanges() before the drawing-done fence is waited for. So in principle, I think you could run in the issue where the CPU flushes the memory range either:

  • before the GPU has finished the draw call that made use of the uniform data (with the "descriptor sets").
  • before the GPU has executed the buffer copy command (vkCmdCopyBuffer) to copy the data from the staging buffer to the destination buffer.

This is because bgfx right now does some of these things in rendererExecCommands(m_renderer->m_cmdPre); which happens before submit() which will acquire the wait-for-fence and acquire the next frame. The command types that go in the pre-buffer seem to be:

		RendererInit,
		RendererShutdownBegin,
		CreateVertexLayout,
		CreateIndexBuffer,
		CreateVertexBuffer,
		CreateDynamicIndexBuffer,
		UpdateDynamicIndexBuffer,
		CreateDynamicVertexBuffer,
		UpdateDynamicVertexBuffer,
		CreateShader,
		CreateProgram,
		CreateTexture,
		UpdateTexture,
		ResizeTexture,
		CreateFrameBuffer,
		CreateUniform,
		UpdateViewName,
		InvalidateOcclusionQuery,
		SetName,

Regarding my other PR with the staging buffers: all the Update and Create command types will use staging buffers. Which are not synchronized on, which might(?) explain the crashes you saw @bkaradzic.

@mcourteaux mcourteaux marked this pull request as draft June 5, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[vulkan] Synchronization issue: vkAcquireNextImage uses a semaphore which is in use.
1 participant