Validation Errors compute shaders - vkResetCommandPool, vkDestroyBuffer #2473

peters-david · 2022-02-10T20:54:32Z

I get a

0 took 5.152398114s
1 took 921.852867ms
MESA-INTEL: error: ../src/intel/vulkan/anv_device.c:3713: GPU hung on one of our command buffers (VK_ERROR_DEVICE_LOST)
thread 'main' panicked at 'Error in Queue::submit: parent device is lost', /home/david/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/6931e57/wgpu/src/backend/direct.rs:231:9

when running this code with "cargo run --release".

https://gist.github.com/peters-david/70a7a7ee6526cb35fe7f7b028cb820f5

Looks like the first two calls to use_gpu() work fine, after that it panics.
Is this intended? If so, why does it work the first two times?
If i remove the lazy_static and request a new wgpu::Instance in use_gpu() it works fine.

Sorry if this is a stupid question, just got started with Rust and wgpu.

I run it on a i7-1065G7 with the ICL GT2 on Linux Ubuntu 21.10.

The text was updated successfully, but these errors were encountered:

kvark · 2022-02-11T21:17:16Z

Please check if you are using latest Intel drivers. We've been submitting issues lately, which are in different stages of fixing on Mesa side.

kvark · 2022-02-11T21:17:39Z

Also, please install Vulkan validation layers if you haven't already. I wonder if it spews out any useful info.

peters-david · 2022-02-12T11:40:26Z

The latest drivers are installed.

Vulkan validation shows this:

[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkResetCommandPool-commandPool-00040 (0xb53e2331)]
        Validation Error: [ VUID-vkResetCommandPool-commandPool-00040 ] Object 0: handle = 0x55b6783678c0, name = _Transit, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0xb53e2331 | Attempt to reset command pool with VkCommandBuffer 0x55b6783678c0[_Transit] which is in use. The Vulkan spec states: All VkCommandBuffer objects allocated from commandPool must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.2.198.0/linux/1.2-extensions/vkspec.html#VUID-vkResetCommandPool-commandPool-00040)
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance]         objects: (type: COMMAND_BUFFER, hndl: 0x55b6783678c0, name: _Transit)
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkResetCommandPool-commandPool-00040 (0xb53e2331)]
        Validation Error: [ VUID-vkResetCommandPool-commandPool-00040 ] Object 0: handle = 0x55b6783691c0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0xb53e2331 | Attempt to reset command pool with VkCommandBuffer 0x55b6783691c0[] which is in use. The Vulkan spec states: All VkCommandBuffer objects allocated from commandPool must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.2.198.0/linux/1.2-extensions/vkspec.html#VUID-vkResetCommandPool-commandPool-00040)
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance]         command buffers: compute pass
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance]         objects: (type: COMMAND_BUFFER, hndl: 0x55b6783691c0, name: ?)
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkDestroyBuffer-buffer-00922 (0xe4549c11)]
        Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0xe7e6d0000000000f, name = <init_buffer>, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0xe7e6d0000000000f[<init_buffer>] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.198.0/linux/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance]         objects: (type: BUFFER, hndl: 0xe7e6d0000000000f, name: <init_buffer>)
[2022-02-12T11:25:41Z INFO  wgpu_core::device] Buffer (1, 1, Vulkan) is dropped
[2022-02-12T11:25:41Z INFO  wgpu_core::device] Buffer (0, 1, Vulkan) is dropped
0 took 5.292182785s
[2022-02-12T11:25:41Z INFO  wgpu_core::device] Created buffer Valid((2, 1, Vulkan)) with BufferDescriptor { label: Some("cpu buffer"), size: 4194304, usage: MAP_READ | COPY_DST, mapped_at_creation: false }
[2022-02-12T11:25:41Z INFO  wgpu_core::device] Created buffer Valid((3, 1, Vulkan)) with BufferDescriptor { label: Some("NN Buffer"), size: 4194304, usage: COPY_SRC | COPY_DST | STORAGE, mapped_at_creation: true }
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkDestroyBuffer-buffer-00922 (0xe4549c11)]
        Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0xcad092000000000d, name = NN Buffer, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0xcad092000000000d[NN Buffer] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.198.0/linux/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
[2022-02-12T11:25:41Z ERROR wgpu_hal::vulkan::instance]         objects: (type: BUFFER, hndl: 0xcad092000000000d, name: NN Buffer)
[2022-02-12T11:25:43Z INFO  wgpu_core::device] Buffer (3, 1, Vulkan) is dropped
[2022-02-12T11:25:43Z INFO  wgpu_core::device] Buffer (2, 1, Vulkan) is dropped
1 took 1.237099087s
[2022-02-12T11:25:43Z INFO  wgpu_core::device] Created buffer Valid((0, 2, Vulkan)) with BufferDescriptor { label: Some("cpu buffer"), size: 4194304, usage: MAP_READ | COPY_DST, mapped_at_creation: false }
[2022-02-12T11:25:43Z INFO  wgpu_core::device] Created buffer Valid((1, 2, Vulkan)) with BufferDescriptor { label: Some("NN Buffer"), size: 4194304, usage: COPY_SRC | COPY_DST | STORAGE, mapped_at_creation: true }
MESA-INTEL: error: ../src/intel/vulkan/anv_device.c:3713: GPU hung on one of our command buffers (VK_ERROR_DEVICE_LOST)
thread 'main' panicked at 'Error in Queue::submit: parent device is lost', /home/david/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/6931e57/wgpu/src/backend/direct.rs:231:9

note: in order to run my example with cargo run line 44 has to be changed to let load = vec![0; 4096*256].into_boxed_slice();

I tested on another system running Ubuntu 20.04 and Nvidia M4000 and it works there.
HOWEVER in another project (can't share the code) i am getting the same errors on the M4000 system, even when rerequesting the wgpu instance.
Looks to me like it is related to compute shaders that take a long time to finish.
I am using the M4000 machine in text mode and only connect via ssh, so it shouldn't be related to some OS induced timeout.

edit: spelling

peters-david · 2022-02-12T13:37:07Z

In the private project i am getting the same errors as above but the device isnt lost.
Instead im getting an output of all 0s, the same might be happening in #1881

kvark · 2022-02-13T04:47:59Z

Not exactly sure what is happening here, but it may very well be related that the tests we have are spewing errors as well when ran on multiple threads (from just cargo test on Linux). This is worrying, and we should fix this ASAP.

peters-david · 2022-02-13T05:07:27Z

Ok, let me know if you need anything more to reproduce / get additional information

kvark · 2022-02-13T05:20:55Z

Actually, the errors I was seeing are fixed by #2476, they are unrelated to your case. Moreover, we are running the compute test concurrently in cargo test, so it should be exercising similar paths. Is cargo test running well for you?

peters-david · 2022-02-13T06:01:24Z

On the ICL GT2 the tests seem to pass although im getting validation errors.
icl_gt2.txt

On the M4000 test conservative_raster fails.
m4000.txt

peters-david · 2022-02-13T10:12:02Z

Your vk-astc branch shows all tests passed, no validation errors on ICL GT2.
On M4000 the conservative_raster still fails on vk-astc.

kvark · 2022-02-14T19:44:07Z

Ok, thank you for confirming! Would you be able to push your test case to a branch of wgpu somewhere, so that we can test it?

peters-david · 2022-02-15T04:54:47Z

Find it in https://github.com/peters-david/wgpu
The test is in wgpu/tests

kvark · 2022-02-15T15:20:53Z

Thanks @peters-david ! I ran your test on Intel Xe graphics (integrated).
First thing:

---- long_running_shader stdout ----
0
thread 'long_running_shader' panicked at 'assertion failed: `(left == right)`
  left: `100000`,
 right: `0`', wgpu/tests/long_running_shader.rs:120:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This is due to a problem with your shader logic, which doesn't really set all the indices:

@builtin(global_invocation_id) id: vec3<u32>
@builtin(local_invocation_index) index: u32
var i: u32 = 256u*id.x+index;

For thread (1,0,0) within the only working group, we'll have id.x == 1 and index == 1, so i == 257.

Once I fix this, the test just runs indefinitely (as you designed). No validation errors or warnings or asserts.

peters-david · 2022-02-15T15:24:11Z

The indeces don't start at 0??

kvark · 2022-02-15T15:29:49Z

Indices start at 0, and your array[0] will be initialized. But array[1] will not ever be, and the assertion triggers that I posted.

peters-david · 2022-02-15T16:50:14Z

Thank you @kvark !
I misunderstood the relation between workgroups, invocations and the difference between local & global.

kvark added type: bug Something isn't working help required We need community help to make this happen. labels Feb 11, 2022

peters-david changed the title ~~GPU hung on one of our command buffers (VK_ERROR_DEVICE_LOST) when using lazy_static~~ Validation Errors compute shaders - vkResetCommandPool, vkDestroyBuffer Feb 12, 2022

kvark pinned this issue Feb 13, 2022

peters-david closed this as completed Feb 15, 2022

cwfitzgerald unpinned this issue Feb 19, 2022

cwfitzgerald pushed a commit that referenced this issue Oct 25, 2023

spv-out: Remove empty else branch. (#2473)

79c5cb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation Errors compute shaders - vkResetCommandPool, vkDestroyBuffer #2473

Validation Errors compute shaders - vkResetCommandPool, vkDestroyBuffer #2473

peters-david commented Feb 10, 2022

kvark commented Feb 11, 2022

kvark commented Feb 11, 2022

peters-david commented Feb 12, 2022 •

edited

Loading

peters-david commented Feb 12, 2022 •

edited

Loading

kvark commented Feb 13, 2022

peters-david commented Feb 13, 2022

kvark commented Feb 13, 2022

peters-david commented Feb 13, 2022 •

edited

Loading

peters-david commented Feb 13, 2022

kvark commented Feb 14, 2022

peters-david commented Feb 15, 2022

kvark commented Feb 15, 2022

peters-david commented Feb 15, 2022 •

edited

Loading

kvark commented Feb 15, 2022

peters-david commented Feb 15, 2022

Validation Errors compute shaders - vkResetCommandPool, vkDestroyBuffer #2473

Validation Errors compute shaders - vkResetCommandPool, vkDestroyBuffer #2473

Comments

peters-david commented Feb 10, 2022

kvark commented Feb 11, 2022

kvark commented Feb 11, 2022

peters-david commented Feb 12, 2022 • edited Loading

peters-david commented Feb 12, 2022 • edited Loading

kvark commented Feb 13, 2022

peters-david commented Feb 13, 2022

kvark commented Feb 13, 2022

peters-david commented Feb 13, 2022 • edited Loading

peters-david commented Feb 13, 2022

kvark commented Feb 14, 2022

peters-david commented Feb 15, 2022

kvark commented Feb 15, 2022

peters-david commented Feb 15, 2022 • edited Loading

kvark commented Feb 15, 2022

peters-david commented Feb 15, 2022

peters-david commented Feb 12, 2022 •

edited

Loading

peters-david commented Feb 12, 2022 •

edited

Loading

peters-david commented Feb 13, 2022 •

edited

Loading

peters-david commented Feb 15, 2022 •

edited

Loading