Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential deadlock in the WaitEvent path of CmdBuffers #2481

Merged

Conversation

fabiomestre
Copy link
Contributor

@fabiomestre fabiomestre commented Dec 18, 2024

Add barriers to the SignalCommandList that guarantee that resetting the WaitEvent is done at the right time.

This fixes a potential race condition where, if the SignalCommandList executes before the ComputeCommandList, the WaitEvent could be reset before the ComputeCommandList can wait on it and, consequently, create a deadlock.

@fabiomestre fabiomestre requested a review from a team as a code owner December 18, 2024 18:04
@github-actions github-actions bot added level-zero L0 adapter specific issues command-buffer Command Buffer feature addition/changes/specification labels Dec 18, 2024
@fabiomestre fabiomestre force-pushed the fabio/fix_potential_race_condition branch from 1743ecf to 5cd90c9 Compare December 18, 2024 18:46
Copy link
Contributor

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the urCommandBufferFillCommandsTest.Buffer/ CTS test is timing out in CI in a few different L0 conditions.

Moves the call to event reset for the WaitEvent and AllResetEvent
to the ComputeCommandList.

This fixes a potential race condition where, if the SignalCommandList
executes before the ComputeCommandList, the WaitEvent could be reset
before the ComputeCommandList can wait on it and, consequently,
create a deadlock.
@fabiomestre fabiomestre force-pushed the fabio/fix_potential_race_condition branch from 5cd90c9 to 017d9cc Compare December 19, 2024 11:28
@fabiomestre
Copy link
Contributor Author

fabiomestre commented Dec 19, 2024

Looks like the urCommandBufferFillCommandsTest.Buffer/ CTS test is timing out in CI in a few different L0 conditions.

I guess that moving the resets to the ComputeCommandList doesn't work when using the CopyCommandList because it might reset the events before the CopyCommandList starts. For that to work we would have to add extra synchronization between the CopyCommandList and the ComputeCommandList.

I'm trying a different apprach at the moment with extra barriers in the SignalCommandList.

@fabiomestre fabiomestre changed the title Move event resets from SignalCommandList to ComputeCommandList Fix potential deadlock in the WaitEvent path of CmdBuffers Dec 19, 2024
Copy link
Contributor

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

I think https://github.com/intel/llvm/blob/sycl/sycl/doc/design/images/L0_UR_command-buffer-v5.jpg could be updated in the DPC++ PR but there's probably a whole bunch of places that doc section isn't quite right, and not sure it's updating giving we're trying to depreciate this path, so leave it up to you whether to do that or not.

@fabiomestre fabiomestre added the ready to merge Added to PR's which are ready to merge label Dec 20, 2024
@martygrant martygrant merged commit ab5e53f into oneapi-src:main Jan 3, 2025
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command-buffer Command Buffer feature addition/changes/specification level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants