ID3D12CommandAllocator Error for heavy computation pipeline #2285

haixuanTao · 2021-12-12T19:01:25Z

Description
I have written a library that transform Deep Learning models into WGPU compute pipeline, called wonnx: https://github.com/haixuanTao/wonnx

The library works on a mnist model on Windows DX12 but larger model like squeezenet fail to run on DX12.
Both mnist and squeezenet works on Linux VULKAN on github action and local with NVIDIA card.

The error I get is:

ID3D12CommandAllocator::Reset: A command allocator 0x000001C4C43FE4C0:'Unnamed ID3D12CommandAllocator Object' is being reset before previous executions associated with the allocator have completed. [ EXECUTION ERROR #552: COMMAND_ALLOCATOR_SYNC]

I have scoped the error to the line:

device.poll(wgpu::Maintain::Wait);

From my research, I think this error has to do with the high number of compute pipeline as squeezenet is 10x larger than mnist.

I have gotten this error on a vagrant VM and github action VM and it might be caused by the virtualisation.

Repro steps
To reproduce the error, you can clone my repo:

SETX RUST_LOG debug
git clone https://github.com/haixuanTao/wonnx
git checkout 71e25a47f5ed831fa96499b77084424188b2e35d
cargo run --example squeeze

You can also run the test that should be passing

cargo test

You can also check my github action here: https://github.com/haixuanTao/wonnx/actions/runs/1569686479 that has test check for both linux x86 and windows x86.

Expected vs observed behavior
I would expect Windows to either fail both MNIST and SQUEEZENET if it was an implementation problem.

Extra materials

time: pre_run: 24.2054ms                                                                                                
[2021-12-12T18:56:15Z INFO  wgpu_core::device] Created buffer Valid((53, 2, Dx12)) with BufferDescriptor { label: Some("staging_squeezenet0_flatten0_reshape0"), size: 4000, usage: MAP_READ | COPY_DST, mapped_at_creation: false }            
time: run: 159.6157ms                                                                                                   
time: run: 200.1022ms                                                                                                   
[2021-12-12T18:56:20Z ERROR wgpu_hal::dx12::instance] ID3D12CommandAllocator::Reset: A command allocator 0x000002035A627380:'Unnamed ID3D12CommandAllocator Object' is being reset before previous executions associated with the allocator have completed. [ EXECUTION ERROR #552: COMMAND_ALLOCATOR_SYNC]                                                             
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance] Process is terminating. Using simple reporting. Please call ReportLiveObjects() at runtime for standard reporting.                                                                        
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance] Live Producer at 0x00000203498D5A98, Refcount: 330.               
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349916220, Refcount: 0.                 
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349F3B600, Refcount: 0.                 
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349F787F0, Refcount: 0.                 
[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance]   Live Object at 0x0000020349F792F0, Refcount: 0.                 
[2021-12-12T18:56:20Z 
.......
wgpu_hal::dx12::instance]   Live Object at 0x000002034A048A00, Refcount: 0.                 

[2021-12-12T18:56:20Z WARN  wgpu_hal::dx12::instance] Live                         Object :      8                      error: process didn't exit successfully: `target\debug\examples\squeeze.exe` (exit code: 1)

Platform
The vagrant VM I am using is the following: https://github.com/nbigaouette/windows_vagrant_rustv

UPDATE: I have now removed the test from my CI to be able to dev

The text was updated successfully, but these errors were encountered:

kvark · 2021-12-12T19:09:11Z

Thank you for filing! I'll see if I can reproduce this on one of my machines.

A correctness bug within wgpu make testing Squeezenet on Windows DX12 impossible ref: gfx-rs/wgpu#2285

kvark · 2021-12-13T18:51:58Z

Unable to reproduce on either Iris 550 or AMD 3500U.
This could be a timing issue, or something related to the adapter. What adapter is used by the VM runs? Could you print out adapter.get_info()?

haixuanTao · 2021-12-13T18:58:00Z

Yep:

[2021-12-13T18:57:03Z INFO  wgpu_core::instance] Adapter Dx12 AdapterInfo { name: "Microsoft Basic Render Driver", vendor: 5140, device: 140, device_type: IntegratedGpu, backend: Dx12 }
AdapterInfo {
    name: "Microsoft Basic Render Driver",
    vendor: 5140,
    device: 140,
    device_type: IntegratedGpu,
    backend: Dx12,
}

kvark · 2021-12-14T15:08:34Z

I tested on WARP, using a back-ported patch from #2290, and still was unable to reproduce this :/

haixuanTao · 2021-12-14T15:13:03Z

Ok. I'm doing a refactoring of my codebase as well as porting an even bigger model. If it is a timing issue, It might be easier to reproduce on a larger pipeline. If the error still happens and is easier to reproduce I'll let you know.

Thanks anyway for the investigation

cwfitzgerald · 2023-10-14T03:53:25Z

Closing in favor of #3193

kvark added area: correctness We're behaving incorrectly type: bug Something isn't working labels Dec 12, 2021

haixuanTao pushed a commit to webonnx/wonnx that referenced this issue Dec 12, 2021

Removing Squeezenet Test from CI

0d5ace1

A correctness bug within wgpu make testing Squeezenet on Windows DX12 impossible ref: gfx-rs/wgpu#2285

kvark mentioned this issue Dec 14, 2021

hal/dx12: expose WARP as a fallback adapter #2290

Merged

haixuanTao mentioned this issue Feb 2, 2022

Make wonnx more robust by using a DAG for optimization and transformation webonnx/wonnx#45

Merged

This was referenced Nov 7, 2022

Zero-initialize workgroup memory #3174

Merged

Early frees on CPU Implementations #3193

Closed

teoxoy added the backend: dx12 Issues with DX12 or DXGI label Feb 24, 2023

cwfitzgerald closed this as completed Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ID3D12CommandAllocator Error for heavy computation pipeline #2285

ID3D12CommandAllocator Error for heavy computation pipeline #2285

haixuanTao commented Dec 12, 2021 •

edited

Loading

kvark commented Dec 12, 2021

kvark commented Dec 13, 2021

haixuanTao commented Dec 13, 2021

kvark commented Dec 14, 2021

haixuanTao commented Dec 14, 2021

cwfitzgerald commented Oct 14, 2023

ID3D12CommandAllocator Error for heavy computation pipeline #2285

ID3D12CommandAllocator Error for heavy computation pipeline #2285

Comments

haixuanTao commented Dec 12, 2021 • edited Loading

kvark commented Dec 12, 2021

kvark commented Dec 13, 2021

haixuanTao commented Dec 13, 2021

kvark commented Dec 14, 2021

haixuanTao commented Dec 14, 2021

cwfitzgerald commented Oct 14, 2023

haixuanTao commented Dec 12, 2021 •

edited

Loading