Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID3D12CommandAllocator Error for heavy computation pipeline #2285

Closed
haixuanTao opened this issue Dec 12, 2021 · 6 comments
Closed

ID3D12CommandAllocator Error for heavy computation pipeline #2285

haixuanTao opened this issue Dec 12, 2021 · 6 comments
Labels
area: correctness We're behaving incorrectly backend: dx12 Issues with DX12 or DXGI type: bug Something isn't working

Comments

@haixuanTao
Copy link

haixuanTao commented Dec 12, 2021

Description
I have written a library that transform Deep Learning models into WGPU compute pipeline, called wonnx: https://github.com/haixuanTao/wonnx

The library works on a mnist model on Windows DX12 but larger model like squeezenet fail to run on DX12.
Both mnist and squeezenet works on Linux VULKAN on github action and local with NVIDIA card.

The error I get is:

ID3D12CommandAllocator::Reset: A command allocator 0x000001C4C43FE4C0:'Unnamed ID3D12CommandAllocator Object' is being reset before previous executions associated with the allocator have completed. [ EXECUTION ERROR #552: COMMAND_ALLOCATOR_SYNC]

I have scoped the error to the line:

device.poll(wgpu::Maintain::Wait);

From my research, I think this error has to do with the high number of compute pipeline as squeezenet is 10x larger than mnist.

I have gotten this error on a vagrant VM and github action VM and it might be caused by the virtualisation.

Repro steps
To reproduce the error, you can clone my repo:

SETX RUST_LOG debug
git clone https://github.com/haixuanTao/wonnx
git checkout 71e25a47f5ed831fa96499b77084424188b2e35d
cargo run --example squeeze

You can also run the test that should be passing

cargo test

You can also check my github action here: https://github.com/haixuanTao/wonnx/actions/runs/1569686479 that has test check for both linux x86 and windows x86.

Expected vs observed behavior
I would expect Windows to either fail both MNIST and SQUEEZENET if it was an implementation problem.

Extra materials

time: pre_run: 24.2054ms                                                                                                
[2021-12-12T18:56:15Z INFO  wgpu_core::device] Created buffer Valid((53, 2, Dx12)) with BufferDescriptor { label: Some("staging_squeezenet0_flatten0_reshape0"), size: 4000, usage: MAP_READ | COPY_DST, mapped_at_creation: false }            
time: run: 159.6157ms                                                                                                   
time: run: 200.1022ms                                                                                                   
[2021-12-12T18:56:20Z ERROR wgpu_hal::dx12::instance] ID3D12CommandAllocator::Reset: A command allocator 0x000002035A627380:'Unnamed ID3D12CommandAllocator Object' is being reset before previous executions associated with the allocator have completed. [ EXECUTION ERROR #552: COMMAND_ALLOCATOR_SYNC]                                                             
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance] Process is terminating. Using simple reporting. Please call ReportLiveObjects() at runtime for standard reporting.                                                                        
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance] Live Producer at 0x00000203498D5A98, Refcount: 330.               
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349916220, Refcount: 0.                 
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349F3B600, Refcount: 0.                 
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349F787F0, Refcount: 0.                 
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349F792F0, Refcount: 0.                 
[2021-12-12T18:56:20Z 
.......
wgpu_hal::dx12::instance]   Live Object at 0x000002034A048A00, Refcount: 0.                 

[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance] Live                         Object :      8                      error: process didn't exit successfully: `target\debug\examples\squeeze.exe` (exit code: 1)    

Platform
The vagrant VM I am using is the following: https://github.com/nbigaouette/windows_vagrant_rustv

UPDATE: I have now removed the test from my CI to be able to dev

@kvark kvark added area: correctness We're behaving incorrectly type: bug Something isn't working labels Dec 12, 2021
@kvark
Copy link
Member

kvark commented Dec 12, 2021

Thank you for filing! I'll see if I can reproduce this on one of my machines.

haixuanTao pushed a commit to webonnx/wonnx that referenced this issue Dec 12, 2021
A correctness bug within wgpu make testing Squeezenet on Windows DX12
impossible ref: gfx-rs/wgpu#2285
@kvark
Copy link
Member

kvark commented Dec 13, 2021

Unable to reproduce on either Iris 550 or AMD 3500U.
This could be a timing issue, or something related to the adapter. What adapter is used by the VM runs? Could you print out adapter.get_info()?

@haixuanTao
Copy link
Author

Yep:

[2021-12-13T18:57:03Z INFO  wgpu_core::instance] Adapter Dx12 AdapterInfo { name: "Microsoft Basic Render Driver", vendor: 5140, device: 140, device_type: IntegratedGpu, backend: Dx12 }
AdapterInfo {
    name: "Microsoft Basic Render Driver",
    vendor: 5140,
    device: 140,
    device_type: IntegratedGpu,
    backend: Dx12,
}

@kvark
Copy link
Member

kvark commented Dec 14, 2021

I tested on WARP, using a back-ported patch from #2290, and still was unable to reproduce this :/

@haixuanTao
Copy link
Author

Ok. I'm doing a refactoring of my codebase as well as porting an even bigger model. If it is a timing issue, It might be easier to reproduce on a larger pipeline. If the error still happens and is easier to reproduce I'll let you know.

Thanks anyway for the investigation

@cwfitzgerald
Copy link
Member

Closing in favor of #3193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: correctness We're behaving incorrectly backend: dx12 Issues with DX12 or DXGI type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants