-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential deadlock in the WaitEvent path of CmdBuffers #2481
Fix potential deadlock in the WaitEvent path of CmdBuffers #2481
Conversation
1743ecf
to
5cd90c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the urCommandBufferFillCommandsTest.Buffer/
CTS test is timing out in CI in a few different L0 conditions.
Moves the call to event reset for the WaitEvent and AllResetEvent to the ComputeCommandList. This fixes a potential race condition where, if the SignalCommandList executes before the ComputeCommandList, the WaitEvent could be reset before the ComputeCommandList can wait on it and, consequently, create a deadlock.
5cd90c9
to
017d9cc
Compare
I guess that moving the resets to the ComputeCommandList doesn't work when using the CopyCommandList because it might reset the events before the CopyCommandList starts. For that to work we would have to add extra synchronization between the CopyCommandList and the ComputeCommandList. I'm trying a different apprach at the moment with extra barriers in the SignalCommandList. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
I think https://github.com/intel/llvm/blob/sycl/sycl/doc/design/images/L0_UR_command-buffer-v5.jpg could be updated in the DPC++ PR but there's probably a whole bunch of places that doc section isn't quite right, and not sure it's updating giving we're trying to depreciate this path, so leave it up to you whether to do that or not.
Add barriers to the SignalCommandList that guarantee that resetting the WaitEvent is done at the right time.
This fixes a potential race condition where, if the SignalCommandList executes before the ComputeCommandList, the WaitEvent could be reset before the ComputeCommandList can wait on it and, consequently, create a deadlock.