-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix undefined variable on OS X < 10.6 #29
Conversation
Thanks for the contribution, but qemu does not accept patches via pull-request. Please close this pull-request and submit your patch to the mailing list: |
target-mips: fix hflags modified in delay / forbidden slot Fixes qemu#29 Signed-off-by: Leon Alrae leon.alrae@imgtec.com
ahci-test /x86_64/ahci/io/dma/lba28/retry triggers the following leak: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fc4b2a25e20 in malloc (/lib64/libasan.so.3+0xc6e20) qemu#1 0x7fc4993bce58 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee58) qemu#2 0x556a187d4b34 in ahci_populate_sglist hw/ide/ahci.c:896 qemu#3 0x556a187d8237 in ahci_dma_prepare_buf hw/ide/ahci.c:1367 qemu#4 0x556a187b5a1a in ide_dma_cb hw/ide/core.c:844 qemu#5 0x556a187d7eec in ahci_start_dma hw/ide/ahci.c:1333 qemu#6 0x556a187b650b in ide_start_dma hw/ide/core.c:921 qemu#7 0x556a187b61e6 in ide_sector_start_dma hw/ide/core.c:911 qemu#8 0x556a187b9e26 in cmd_write_dma hw/ide/core.c:1486 qemu#9 0x556a187bd519 in ide_exec_cmd hw/ide/core.c:2027 qemu#10 0x556a187d71c5 in handle_reg_h2d_fis hw/ide/ahci.c:1204 qemu#11 0x556a187d7681 in handle_cmd hw/ide/ahci.c:1254 qemu#12 0x556a187d168a in check_cmd hw/ide/ahci.c:510 qemu#13 0x556a187d0afc in ahci_port_write hw/ide/ahci.c:314 qemu#14 0x556a187d105d in ahci_mem_write hw/ide/ahci.c:435 qemu#15 0x556a1831d959 in memory_region_write_accessor /home/elmarco/src/qemu/memory.c:525 qemu#16 0x556a1831dc35 in access_with_adjusted_size /home/elmarco/src/qemu/memory.c:591 qemu#17 0x556a18323ce3 in memory_region_dispatch_write /home/elmarco/src/qemu/memory.c:1262 qemu#18 0x556a1828cf67 in address_space_write_continue /home/elmarco/src/qemu/exec.c:2578 qemu#19 0x556a1828d20b in address_space_write /home/elmarco/src/qemu/exec.c:2635 qemu#20 0x556a1828d92b in address_space_rw /home/elmarco/src/qemu/exec.c:2737 qemu#21 0x556a1828daf7 in cpu_physical_memory_rw /home/elmarco/src/qemu/exec.c:2746 qemu#22 0x556a183068d3 in cpu_physical_memory_write /home/elmarco/src/qemu/include/exec/cpu-common.h:72 qemu#23 0x556a18308194 in qtest_process_command /home/elmarco/src/qemu/qtest.c:382 qemu#24 0x556a18309999 in qtest_process_inbuf /home/elmarco/src/qemu/qtest.c:573 qemu#25 0x556a18309a4a in qtest_read /home/elmarco/src/qemu/qtest.c:585 qemu#26 0x556a18598b85 in qemu_chr_be_write_impl /home/elmarco/src/qemu/qemu-char.c:387 qemu#27 0x556a18598c52 in qemu_chr_be_write /home/elmarco/src/qemu/qemu-char.c:399 qemu#28 0x556a185a2afa in tcp_chr_read /home/elmarco/src/qemu/qemu-char.c:2902 qemu#29 0x556a18cbaf52 in qio_channel_fd_source_dispatch io/channel-watch.c:84 Follow John Snow recommendation: Everywhere else ncq_err is used, it is accompanied by a list cleanup except for ncq_cb, which is the case you are fixing here. Move the sglist destruction inside of ncq_err and then delete it from the other two locations to keep it tidy. Call dma_buf_commit in ide_dma_cb after the early return. Though, this is also a little wonky because this routine does more than clear the list, but it is at the moment the centralized "we're done with the sglist" function and none of the other side effects that occur in dma_buf_commit will interfere with the reset that occurs from ide_restart_bh, I think Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
ahci-test /x86_64/ahci/io/dma/lba28/retry triggers the following leak: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fc4b2a25e20 in malloc (/lib64/libasan.so.3+0xc6e20) #1 0x7fc4993bce58 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee58) #2 0x556a187d4b34 in ahci_populate_sglist hw/ide/ahci.c:896 qemu#3 0x556a187d8237 in ahci_dma_prepare_buf hw/ide/ahci.c:1367 qemu#4 0x556a187b5a1a in ide_dma_cb hw/ide/core.c:844 qemu#5 0x556a187d7eec in ahci_start_dma hw/ide/ahci.c:1333 qemu#6 0x556a187b650b in ide_start_dma hw/ide/core.c:921 qemu#7 0x556a187b61e6 in ide_sector_start_dma hw/ide/core.c:911 qemu#8 0x556a187b9e26 in cmd_write_dma hw/ide/core.c:1486 qemu#9 0x556a187bd519 in ide_exec_cmd hw/ide/core.c:2027 qemu#10 0x556a187d71c5 in handle_reg_h2d_fis hw/ide/ahci.c:1204 qemu#11 0x556a187d7681 in handle_cmd hw/ide/ahci.c:1254 qemu#12 0x556a187d168a in check_cmd hw/ide/ahci.c:510 qemu#13 0x556a187d0afc in ahci_port_write hw/ide/ahci.c:314 qemu#14 0x556a187d105d in ahci_mem_write hw/ide/ahci.c:435 qemu#15 0x556a1831d959 in memory_region_write_accessor /home/elmarco/src/qemu/memory.c:525 qemu#16 0x556a1831dc35 in access_with_adjusted_size /home/elmarco/src/qemu/memory.c:591 qemu#17 0x556a18323ce3 in memory_region_dispatch_write /home/elmarco/src/qemu/memory.c:1262 qemu#18 0x556a1828cf67 in address_space_write_continue /home/elmarco/src/qemu/exec.c:2578 qemu#19 0x556a1828d20b in address_space_write /home/elmarco/src/qemu/exec.c:2635 qemu#20 0x556a1828d92b in address_space_rw /home/elmarco/src/qemu/exec.c:2737 qemu#21 0x556a1828daf7 in cpu_physical_memory_rw /home/elmarco/src/qemu/exec.c:2746 qemu#22 0x556a183068d3 in cpu_physical_memory_write /home/elmarco/src/qemu/include/exec/cpu-common.h:72 qemu#23 0x556a18308194 in qtest_process_command /home/elmarco/src/qemu/qtest.c:382 qemu#24 0x556a18309999 in qtest_process_inbuf /home/elmarco/src/qemu/qtest.c:573 qemu#25 0x556a18309a4a in qtest_read /home/elmarco/src/qemu/qtest.c:585 qemu#26 0x556a18598b85 in qemu_chr_be_write_impl /home/elmarco/src/qemu/qemu-char.c:387 qemu#27 0x556a18598c52 in qemu_chr_be_write /home/elmarco/src/qemu/qemu-char.c:399 qemu#28 0x556a185a2afa in tcp_chr_read /home/elmarco/src/qemu/qemu-char.c:2902 qemu#29 0x556a18cbaf52 in qio_channel_fd_source_dispatch io/channel-watch.c:84 Follow John Snow recommendation: Everywhere else ncq_err is used, it is accompanied by a list cleanup except for ncq_cb, which is the case you are fixing here. Move the sglist destruction inside of ncq_err and then delete it from the other two locations to keep it tidy. Call dma_buf_commit in ide_dma_cb after the early return. Though, this is also a little wonky because this routine does more than clear the list, but it is at the moment the centralized "we're done with the sglist" function and none of the other side effects that occur in dma_buf_commit will interfere with the reset that occurs from ide_restart_bh, I think Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
ahci-test /x86_64/ahci/io/dma/lba28/retry triggers the following leak: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fc4b2a25e20 in malloc (/lib64/libasan.so.3+0xc6e20) #1 0x7fc4993bce58 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee58) #2 0x556a187d4b34 in ahci_populate_sglist hw/ide/ahci.c:896 #3 0x556a187d8237 in ahci_dma_prepare_buf hw/ide/ahci.c:1367 #4 0x556a187b5a1a in ide_dma_cb hw/ide/core.c:844 #5 0x556a187d7eec in ahci_start_dma hw/ide/ahci.c:1333 #6 0x556a187b650b in ide_start_dma hw/ide/core.c:921 #7 0x556a187b61e6 in ide_sector_start_dma hw/ide/core.c:911 #8 0x556a187b9e26 in cmd_write_dma hw/ide/core.c:1486 #9 0x556a187bd519 in ide_exec_cmd hw/ide/core.c:2027 #10 0x556a187d71c5 in handle_reg_h2d_fis hw/ide/ahci.c:1204 #11 0x556a187d7681 in handle_cmd hw/ide/ahci.c:1254 #12 0x556a187d168a in check_cmd hw/ide/ahci.c:510 #13 0x556a187d0afc in ahci_port_write hw/ide/ahci.c:314 #14 0x556a187d105d in ahci_mem_write hw/ide/ahci.c:435 #15 0x556a1831d959 in memory_region_write_accessor /home/elmarco/src/qemu/memory.c:525 #16 0x556a1831dc35 in access_with_adjusted_size /home/elmarco/src/qemu/memory.c:591 #17 0x556a18323ce3 in memory_region_dispatch_write /home/elmarco/src/qemu/memory.c:1262 #18 0x556a1828cf67 in address_space_write_continue /home/elmarco/src/qemu/exec.c:2578 #19 0x556a1828d20b in address_space_write /home/elmarco/src/qemu/exec.c:2635 #20 0x556a1828d92b in address_space_rw /home/elmarco/src/qemu/exec.c:2737 #21 0x556a1828daf7 in cpu_physical_memory_rw /home/elmarco/src/qemu/exec.c:2746 #22 0x556a183068d3 in cpu_physical_memory_write /home/elmarco/src/qemu/include/exec/cpu-common.h:72 #23 0x556a18308194 in qtest_process_command /home/elmarco/src/qemu/qtest.c:382 #24 0x556a18309999 in qtest_process_inbuf /home/elmarco/src/qemu/qtest.c:573 #25 0x556a18309a4a in qtest_read /home/elmarco/src/qemu/qtest.c:585 #26 0x556a18598b85 in qemu_chr_be_write_impl /home/elmarco/src/qemu/qemu-char.c:387 #27 0x556a18598c52 in qemu_chr_be_write /home/elmarco/src/qemu/qemu-char.c:399 #28 0x556a185a2afa in tcp_chr_read /home/elmarco/src/qemu/qemu-char.c:2902 #29 0x556a18cbaf52 in qio_channel_fd_source_dispatch io/channel-watch.c:84 Follow John Snow recommendation: Everywhere else ncq_err is used, it is accompanied by a list cleanup except for ncq_cb, which is the case you are fixing here. Move the sglist destruction inside of ncq_err and then delete it from the other two locations to keep it tidy. Call dma_buf_commit in ide_dma_cb after the early return. Though, this is also a little wonky because this routine does more than clear the list, but it is at the moment the centralized "we're done with the sglist" function and none of the other side effects that occur in dma_buf_commit will interfere with the reset that occurs from ide_restart_bh, I think Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> (cherry picked from commit 5839df7) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
RH-Author: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: <20161129073543.13711-11-marcandre.lureau@redhat.com> Patchwork-id: 72914 O-Subject: [RHEV-7.3.z qemu-kvm-rhev PATCH 10/10] ahci: fix sglist leak on retry Bugzilla: 1397745 RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com> RH-Acked-by: Laurent Vivier <lvivier@redhat.com> RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com> ahci-test /x86_64/ahci/io/dma/lba28/retry triggers the following leak: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fc4b2a25e20 in malloc (/lib64/libasan.so.3+0xc6e20) #1 0x7fc4993bce58 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee58) #2 0x556a187d4b34 in ahci_populate_sglist hw/ide/ahci.c:896 #3 0x556a187d8237 in ahci_dma_prepare_buf hw/ide/ahci.c:1367 open-power-host-os#4 0x556a187b5a1a in ide_dma_cb hw/ide/core.c:844 open-power-host-os#5 0x556a187d7eec in ahci_start_dma hw/ide/ahci.c:1333 open-power-host-os#6 0x556a187b650b in ide_start_dma hw/ide/core.c:921 open-power-host-os#7 0x556a187b61e6 in ide_sector_start_dma hw/ide/core.c:911 open-power-host-os#8 0x556a187b9e26 in cmd_write_dma hw/ide/core.c:1486 open-power-host-os#9 0x556a187bd519 in ide_exec_cmd hw/ide/core.c:2027 open-power-host-os#10 0x556a187d71c5 in handle_reg_h2d_fis hw/ide/ahci.c:1204 open-power-host-os#11 0x556a187d7681 in handle_cmd hw/ide/ahci.c:1254 open-power-host-os#12 0x556a187d168a in check_cmd hw/ide/ahci.c:510 open-power-host-os#13 0x556a187d0afc in ahci_port_write hw/ide/ahci.c:314 open-power-host-os#14 0x556a187d105d in ahci_mem_write hw/ide/ahci.c:435 open-power-host-os#15 0x556a1831d959 in memory_region_write_accessor /home/elmarco/src/qemu/memory.c:525 open-power-host-os#16 0x556a1831dc35 in access_with_adjusted_size /home/elmarco/src/qemu/memory.c:591 open-power-host-os#17 0x556a18323ce3 in memory_region_dispatch_write /home/elmarco/src/qemu/memory.c:1262 open-power-host-os#18 0x556a1828cf67 in address_space_write_continue /home/elmarco/src/qemu/exec.c:2578 open-power-host-os#19 0x556a1828d20b in address_space_write /home/elmarco/src/qemu/exec.c:2635 open-power-host-os#20 0x556a1828d92b in address_space_rw /home/elmarco/src/qemu/exec.c:2737 open-power-host-os#21 0x556a1828daf7 in cpu_physical_memory_rw /home/elmarco/src/qemu/exec.c:2746 open-power-host-os#22 0x556a183068d3 in cpu_physical_memory_write /home/elmarco/src/qemu/include/exec/cpu-common.h:72 open-power-host-os#23 0x556a18308194 in qtest_process_command /home/elmarco/src/qemu/qtest.c:382 open-power-host-os#24 0x556a18309999 in qtest_process_inbuf /home/elmarco/src/qemu/qtest.c:573 open-power-host-os#25 0x556a18309a4a in qtest_read /home/elmarco/src/qemu/qtest.c:585 open-power-host-os#26 0x556a18598b85 in qemu_chr_be_write_impl /home/elmarco/src/qemu/qemu-char.c:387 open-power-host-os#27 0x556a18598c52 in qemu_chr_be_write /home/elmarco/src/qemu/qemu-char.c:399 open-power-host-os#28 0x556a185a2afa in tcp_chr_read /home/elmarco/src/qemu/qemu-char.c:2902 open-power-host-os#29 0x556a18cbaf52 in qio_channel_fd_source_dispatch io/channel-watch.c:84 Follow John Snow recommendation: Everywhere else ncq_err is used, it is accompanied by a list cleanup except for ncq_cb, which is the case you are fixing here. Move the sglist destruction inside of ncq_err and then delete it from the other two locations to keep it tidy. Call dma_buf_commit in ide_dma_cb after the early return. Though, this is also a little wonky because this routine does more than clear the list, but it is at the moment the centralized "we're done with the sglist" function and none of the other side effects that occur in dma_buf_commit will interfere with the reset that occurs from ide_restart_bh, I think Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> (cherry picked from commit 5839df7) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Hi, I'd like to report one use-after-free problem which is found by AddressSanitizer. My company provides virtualization server with Qemu-2.7. If customer commands Disk hot-unplug on Web-based application, the application sends "device_del" and "drive_del" commands to qemu process via QMP. It usually works fine. But I found a corner case like following. 1. Customer commands Disk hot-unplug via Web-based application. 2. The application sends "device_del" that sometimes takes unpredictable time. 3. There are some inflight IOs 4. Customer does reboot or shutdown the guest OS 4. Qemu process generates segfault. If a disk is unplugged with "device_del" and "drive_del" commands, qemu does not generate problem. But if only "device_del" command are finished and guest os reboots, qemu generates segfault. I can reproduce that case easily with following commands 1. generate heavy IO on disk in Guest OS 2. run "device_del" on the qemu monitor 3. run "system_reboot" on the qemu monitor According to AdressSanitizer, VirtQueue is freed by virtio_cleanup but dereferenced by asynchronous request. I think asynchronous requests should be finished before virtio_cleanup, so I added blk_drain() before virtio_cleanup. Following is configure options I used to build qemu with AddressSanitizer. ./configure --target-list=x86_64-softmmu --extra-cflags="-fsanitize=address -fno-omit-frame-pointer" --enable-debug Following is the report of AddressSanitizer. ------------------------------------- 8< ---------------------------------------- ==8801==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8801==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc984da000; bottom 0x7fa524363000; size: 0x005774177000 (375609847808) False positive error reports may follow For details see http://code.google.com/p/address-sanitizer/issues/detail?id=189 ================================================================= ==8801==ERROR: AddressSanitizer: heap-use-after-free on address 0x7fa5240ab82c at pc 0x55ccd7bffb76 bp 0x7fa524364590 sp 0x7fa524364580 READ of size 2 at 0x7fa5240ab82c thread T0 #0 0x55ccd7bffb75 in virtqueue_fill /home/gohkim/work/tools/qemu/hw/virtio/virtio.c:284 qemu#1 0x55ccd7bffe46 in virtqueue_push /home/gohkim/work/tools/qemu/hw/virtio/virtio.c:308 qemu#2 0x55ccd7b66c74 in virtio_blk_req_complete /home/gohkim/work/tools/qemu/hw/block/virtio-blk.c:58 qemu#3 0x55ccd7b67154 in virtio_blk_rw_complete /home/gohkim/work/tools/qemu/hw/block/virtio-blk.c:121 qemu#4 0x55ccd83852b9 in blk_aio_complete block/block-backend.c:923 qemu#5 0x55ccd8385a73 in blk_aio_write_entry block/block-backend.c:984 qemu#6 0x55ccd84c7d68 in coroutine_trampoline util/coroutine-ucontext.c:78 qemu#7 0x7fa5702d45cf (/lib/x86_64-linux-gnu/libc.so.6+0x495cf) 0x7fa5240ab82c is located 44 bytes inside of 131072-byte region [0x7fa5240ab800,0x7fa5240cb800) freed by thread T0 here: #0 0x7fa573f876aa in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x986aa) qemu#1 0x55ccd7c07fa4 in virtio_cleanup /home/gohkim/work/tools/qemu/hw/virtio/virtio.c:1678 qemu#2 0x55ccd7b6c585 in virtio_blk_device_unrealize /home/gohkim/work/tools/qemu/hw/block/virtio-blk.c:948 qemu#3 0x55ccd7c09734 in virtio_device_unrealize /home/gohkim/work/tools/qemu/hw/virtio/virtio.c:1950 qemu#4 0x55ccd7eb9bb7 in device_set_realized hw/core/qdev.c:964 qemu#5 0x55ccd82c1bef in property_set_bool qom/object.c:1853 qemu#6 0x55ccd82be535 in object_property_set qom/object.c:1087 qemu#7 0x55ccd82c454f in object_property_set_qobject qom/qom-qobject.c:27 qemu#8 0x55ccd82be810 in object_property_set_bool qom/object.c:1156 qemu#9 0x55ccd7ec1395 in bus_set_realized hw/core/bus.c:181 qemu#10 0x55ccd82c1bef in property_set_bool qom/object.c:1853 qemu#11 0x55ccd82be535 in object_property_set qom/object.c:1087 qemu#12 0x55ccd82c454f in object_property_set_qobject qom/qom-qobject.c:27 qemu#13 0x55ccd82be810 in object_property_set_bool qom/object.c:1156 qemu#14 0x55ccd7eb9a8a in device_set_realized hw/core/qdev.c:956 qemu#15 0x55ccd82c1bef in property_set_bool qom/object.c:1853 qemu#16 0x55ccd82be535 in object_property_set qom/object.c:1087 qemu#17 0x55ccd82c454f in object_property_set_qobject qom/qom-qobject.c:27 qemu#18 0x55ccd82be810 in object_property_set_bool qom/object.c:1156 qemu#19 0x55ccd7eba820 in device_unparent hw/core/qdev.c:1099 qemu#20 0x55ccd82bf50b in object_finalize_child_property qom/object.c:1362 qemu#21 0x55ccd82bb3c6 in object_property_del_child qom/object.c:422 qemu#22 0x55ccd82bb601 in object_unparent qom/object.c:441 qemu#23 0x55ccd7e221f1 in acpi_pcihp_eject_slot hw/acpi/pcihp.c:139 qemu#24 0x55ccd7e22301 in acpi_pcihp_update_hotplug_bus hw/acpi/pcihp.c:152 qemu#25 0x55ccd7e22600 in acpi_pcihp_update hw/acpi/pcihp.c:176 qemu#26 0x55ccd7e22628 in acpi_pcihp_reset hw/acpi/pcihp.c:182 qemu#27 0x55ccd7e1fc10 in piix4_reset hw/acpi/piix4.c:363 qemu#28 0x55ccd7da4ebd in qemu_devices_reset /home/gohkim/work/tools/qemu/vl.c:1713 qemu#29 0x55ccd7c28354 in pc_machine_reset /home/gohkim/work/tools/qemu/hw/i386/pc.c:2178 previously allocated by thread T0 here: #0 0x7fa573f87b49 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98b49) qemu#1 0x7fa5718575d0 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4f5d0) qemu#2 0x55ccd7b6bfa5 in virtio_blk_device_realize /home/gohkim/work/tools/qemu/hw/block/virtio-blk.c:910 qemu#3 0x55ccd7c09500 in virtio_device_realize /home/gohkim/work/tools/qemu/hw/virtio/virtio.c:1927 qemu#4 0x55ccd7eb9690 in device_set_realized hw/core/qdev.c:918 qemu#5 0x55ccd82c1bef in property_set_bool qom/object.c:1853 qemu#6 0x55ccd82be535 in object_property_set qom/object.c:1087 qemu#7 0x55ccd82c454f in object_property_set_qobject qom/qom-qobject.c:27 qemu#8 0x55ccd82be810 in object_property_set_bool qom/object.c:1156 qemu#9 0x55ccd81b1b5c in virtio_blk_pci_realize hw/virtio/virtio-pci.c:1897 qemu#10 0x55ccd81b14df in virtio_pci_realize hw/virtio/virtio-pci.c:1799 qemu#11 0x55ccd805e2c5 in pci_qdev_realize hw/pci/pci.c:1966 qemu#12 0x55ccd81b17db in virtio_pci_dc_realize hw/virtio/virtio-pci.c:1852 qemu#13 0x55ccd7eb9690 in device_set_realized hw/core/qdev.c:918 qemu#14 0x55ccd82c1bef in property_set_bool qom/object.c:1853 qemu#15 0x55ccd82be535 in object_property_set qom/object.c:1087 qemu#16 0x55ccd82c454f in object_property_set_qobject qom/qom-qobject.c:27 qemu#17 0x55ccd82be810 in object_property_set_bool qom/object.c:1156 qemu#18 0x55ccd7d8141e in qdev_device_add /home/gohkim/work/tools/qemu/qdev-monitor.c:618 qemu#19 0x55ccd7dab462 in device_init_func /home/gohkim/work/tools/qemu/vl.c:2316 qemu#20 0x55ccd84bd18c in qemu_opts_foreach util/qemu-option.c:1116 qemu#21 0x55ccd7db2bdb in main /home/gohkim/work/tools/qemu/vl.c:4507 qemu#22 0x7fa5702ababf in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x20abf) SUMMARY: AddressSanitizer: heap-use-after-free /home/gohkim/work/tools/qemu/hw/virtio/virtio.c:284 virtqueue_fill Shadow bytes around the buggy address: 0x0ff52480d6b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0ff52480d6c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0ff52480d6d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0ff52480d6e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0ff52480d6f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x0ff52480d700: fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd 0x0ff52480d710: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0ff52480d720: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0ff52480d730: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0ff52480d740: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0ff52480d750: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe ==8801==ABORTING [v1]: add braces to fix style error Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
With a Spice port chardev, it is possible to reenter monitor_qapi_event_queue() (when the client disconnects for example). This will dead-lock on monitor_lock. Instead, use some TLS variables to check for recursion and queue the events. Fixes: (gdb) bt #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645 #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149 #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235 #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316 #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197 #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197 #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414 #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388 #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347 #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341 #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353 #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199 #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112 #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147 #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42 #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425 #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468 #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517 #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538 #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624 #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649 #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58 #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822 #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862 #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644 Note that error report is now moved to the first caller, which may receive an error for a recursed event. This is probably fine (95% of callers use &error_abort, the rest have NULL error and ignore it) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180731150144.14022-1-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [*_no_recurse renamed to *_no_reenter, local variables reordered] Signed-off-by: Markus Armbruster <armbru@redhat.com>
In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 #2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 #3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66 #4 0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493 #5 0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478 #6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406 #10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true) at util/aio-posix.c:693 #11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64 #12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308 #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 #2 0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:92 #3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 #4 0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78 #5 0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410 #6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272 #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497 #11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574 #12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false) at util/aio-posix.c:604 #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273 #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390 #15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590 #16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774 #17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401 #18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449 #19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080) at block/commit.c:92 #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973 #21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90 #22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118 #23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436 #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201 #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854 #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 #28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 #30 0x00005631fbd81412 in main_loop () at vl.c:1866 #31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 = (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0, in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0, sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241} (gdb) p/x s->io_q.in_flight $4 = 0xfffffff0 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Both virtio-blk and virtio-scsi use virtio_queue_empty() as the loop condition in VQ handlers (virtio_blk_handle_vq, virtio_scsi_handle_cmd_vq). When a device is marked broken in virtqueue_pop, for example if a vIOMMU address translation failed, we want to break out of the loop. This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64 kernel: $ qemu-system-x86_64 \ ... \ -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \ -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0 The dead loop happens immediately when the kernel boots and initializes the device, where virtio_scsi_data_plane_handle_cmd will not return: > ... > qemu#13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq > qemu#14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd > qemu#15 0x00005586602ddab7 in virtio_queue_notify_aio_vq > qemu#16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll > qemu#17 0x00005586607885da in run_poll_handlers_once > qemu#18 0x000055866078880e in try_poll_mode > qemu#19 0x00005586607888eb in aio_poll > qemu#20 0x0000558660784561 in aio_wait_bh_oneshot > qemu#21 0x00005586602b9582 in virtio_scsi_dataplane_stop > qemu#22 0x00005586605a7110 in virtio_bus_stop_ioeventfd > qemu#23 0x00005586605a9426 in virtio_pci_stop_ioeventfd > qemu#24 0x00005586605ab808 in virtio_pci_common_write > qemu#25 0x0000558660242396 in memory_region_write_accessor > qemu#26 0x00005586602425ab in access_with_adjusted_size > qemu#27 0x0000558660245281 in memory_region_dispatch_write > qemu#28 0x00005586601e008e in flatview_write_continue > qemu#29 0x00005586601e01d8 in flatview_write > qemu#30 0x00005586601e04de in address_space_write > qemu#31 0x00005586601e052f in address_space_rw > qemu#32 0x00005586602607f2 in kvm_cpu_exec > qemu#33 0x0000558660227148 in qemu_kvm_cpu_thread_fn > qemu#34 0x000055866078bde7 in qemu_thread_start > qemu#35 0x00007f5784906594 in start_thread > qemu#36 0x00007f5784639e6f in clone With this patch, virtio_queue_empty will now return 1 as soon as the vdev is marked as broken, after a "virtio: zero sized buffers are not allowed" error. To be consistent, update virtio_queue_empty_rcu as well. Signed-off-by: Fam Zheng <famz@redhat.com> --- v2: - Drop ATS condition from the patch description since it is not essential. - Drop patch 2. [Paolo]
Both virtio-blk and virtio-scsi use virtio_queue_empty() as the loop condition in VQ handlers (virtio_blk_handle_vq, virtio_scsi_handle_cmd_vq). When a device is marked broken in virtqueue_pop, for example if a vIOMMU address translation failed, we want to break out of the loop. This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64 kernel with ATS enabled: $ qemu-system-x86_64 \ ... \ -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \ -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0 The dead loop happens immediately when the kernel boots and initializes the device, where virtio_scsi_data_plane_handle_cmd will not return: > ... > #13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq > #14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd > #15 0x00005586602ddab7 in virtio_queue_notify_aio_vq > #16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll > #17 0x00005586607885da in run_poll_handlers_once > #18 0x000055866078880e in try_poll_mode > #19 0x00005586607888eb in aio_poll > #20 0x0000558660784561 in aio_wait_bh_oneshot > #21 0x00005586602b9582 in virtio_scsi_dataplane_stop > #22 0x00005586605a7110 in virtio_bus_stop_ioeventfd > #23 0x00005586605a9426 in virtio_pci_stop_ioeventfd > #24 0x00005586605ab808 in virtio_pci_common_write > #25 0x0000558660242396 in memory_region_write_accessor > #26 0x00005586602425ab in access_with_adjusted_size > #27 0x0000558660245281 in memory_region_dispatch_write > #28 0x00005586601e008e in flatview_write_continue > #29 0x00005586601e01d8 in flatview_write > #30 0x00005586601e04de in address_space_write > #31 0x00005586601e052f in address_space_rw > #32 0x00005586602607f2 in kvm_cpu_exec > #33 0x0000558660227148 in qemu_kvm_cpu_thread_fn > #34 0x000055866078bde7 in qemu_thread_start > #35 0x00007f5784906594 in start_thread > #36 0x00007f5784639e6f in clone With this patch, virtio_queue_empty will now return 1 as soon as the vdev is marked as broken, after a "virtio: zero sized buffers are not allowed" error. To be consistent, update virtio_queue_empty_rcu as well. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180910145616.8598-2-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Both virtio-blk and virtio-scsi use virtio_queue_empty() as the loop condition in VQ handlers (virtio_blk_handle_vq, virtio_scsi_handle_cmd_vq). When a device is marked broken in virtqueue_pop, for example if a vIOMMU address translation failed, we want to break out of the loop. This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64 kernel with ATS enabled: $ qemu-system-x86_64 \ ... \ -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \ -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0 The dead loop happens immediately when the kernel boots and initializes the device, where virtio_scsi_data_plane_handle_cmd will not return: > ... > qemu#13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq > qemu#14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd > qemu#15 0x00005586602ddab7 in virtio_queue_notify_aio_vq > qemu#16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll > qemu#17 0x00005586607885da in run_poll_handlers_once > qemu#18 0x000055866078880e in try_poll_mode > qemu#19 0x00005586607888eb in aio_poll > qemu#20 0x0000558660784561 in aio_wait_bh_oneshot > qemu#21 0x00005586602b9582 in virtio_scsi_dataplane_stop > qemu#22 0x00005586605a7110 in virtio_bus_stop_ioeventfd > qemu#23 0x00005586605a9426 in virtio_pci_stop_ioeventfd > qemu#24 0x00005586605ab808 in virtio_pci_common_write > qemu#25 0x0000558660242396 in memory_region_write_accessor > qemu#26 0x00005586602425ab in access_with_adjusted_size > qemu#27 0x0000558660245281 in memory_region_dispatch_write > qemu#28 0x00005586601e008e in flatview_write_continue > qemu#29 0x00005586601e01d8 in flatview_write > qemu#30 0x00005586601e04de in address_space_write > qemu#31 0x00005586601e052f in address_space_rw > qemu#32 0x00005586602607f2 in kvm_cpu_exec > qemu#33 0x0000558660227148 in qemu_kvm_cpu_thread_fn > qemu#34 0x000055866078bde7 in qemu_thread_start > qemu#35 0x00007f5784906594 in start_thread > qemu#36 0x00007f5784639e6f in clone With this patch, virtio_queue_empty will now return 1 as soon as the vdev is marked as broken, after a "virtio: zero sized buffers are not allowed" error. To be consistent, update virtio_queue_empty_rcu as well. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180910145616.8598-2-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 qemu#1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 qemu#2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 qemu#3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66 qemu#4 0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493 qemu#5 0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478 qemu#6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 qemu#8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 qemu#9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406 qemu#10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true) at util/aio-posix.c:693 qemu#11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64 qemu#12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308 qemu#13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 qemu#1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 qemu#2 0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:92 qemu#3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 qemu#4 0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78 qemu#5 0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410 qemu#6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 qemu#8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 qemu#9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272 qemu#10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497 qemu#11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574 qemu#12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false) at util/aio-posix.c:604 qemu#13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273 qemu#14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390 qemu#15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590 qemu#16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774 qemu#17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401 qemu#18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449 qemu#19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080) at block/commit.c:92 qemu#20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973 qemu#21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90 qemu#22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118 qemu#23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436 qemu#24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 qemu#25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201 qemu#26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854 qemu#27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 qemu#28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 qemu#29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 qemu#30 0x00005631fbd81412 in main_loop () at vl.c:1866 qemu#31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 = (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0, in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0, sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241} (gdb) p/x s->io_q.in_flight $4 = 0xfffffff0 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
RH-Author: Kevin Wolf <kwolf@redhat.com> Message-id: <20181010202213.7372-14-kwolf@redhat.com> Patchwork-id: 82603 O-Subject: [RHEL-8 qemu-kvm PATCH 23/44] block/linux-aio: acquire AioContext before qemu_laio_process_completions Bugzilla: 1637976 RH-Acked-by: Max Reitz <mreitz@redhat.com> RH-Acked-by: John Snow <jsnow@redhat.com> RH-Acked-by: Thomas Huth <thuth@redhat.com> From: Sergio Lopez <slp@redhat.com> In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 qemu#1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 qemu#2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 qemu#3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66 qemu#4 0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493 qemu#5 0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478 qemu#6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 qemu#8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 qemu#9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406 qemu#10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true) at util/aio-posix.c:693 qemu#11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64 qemu#12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308 qemu#13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 qemu#1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 qemu#2 0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:92 qemu#3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 qemu#4 0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78 qemu#5 0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410 qemu#6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 qemu#8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 qemu#9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272 qemu#10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497 qemu#11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574 qemu#12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false) at util/aio-posix.c:604 qemu#13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273 qemu#14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390 qemu#15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590 qemu#16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774 qemu#17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401 qemu#18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449 qemu#19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080) at block/commit.c:92 qemu#20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973 qemu#21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90 qemu#22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118 qemu#23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436 qemu#24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 qemu#25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201 qemu#26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854 qemu#27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 qemu#28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 qemu#29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 qemu#30 0x00005631fbd81412 in main_loop () at vl.c:1866 qemu#31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 qemu#7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 = (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0, in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0, sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241} (gdb) p/x s->io_q.in_flight $4 = 0xfffffff0 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit e091f0e) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
All paths that lead to bdrv_backup_top_drop(), except for the call from backup_clean(), imply that the BDS AioContext has already been acquired, so doing it there too can potentially lead to QEMU hanging on AIO_WAIT_WHILE(). An easy way to trigger this situation is by issuing a two actions transaction, with a proper and a bogus blockdev-backup, so the second one will trigger a rollback. This will trigger a hang with an stack trace like this one: #0 0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 qemu#1 0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 qemu#2 0x000055e743386e09 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 qemu#3 0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true) at util/aio-posix.c:669 qemu#4 0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878 qemu#5 0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017 qemu#6 0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262 qemu#7 0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644 qemu#8 0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273 qemu#9 0x000055e74331461f in backup_job_create (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478 qemu#10 0x000055e74315bc52 in do_backup_common (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0) at blockdev.c:3580 qemu#11 0x000055e74315c37c in do_blockdev_backup (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0) at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492 qemu#12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018) at blockdev.c:1885 qemu#13 0x000055e743160152 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340 qemu#14 0x000055e743287ff5 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8) at qapi/qapi-commands-transaction.c:44 qemu#15 0x000055e74333de6c in do_qmp_dispatch (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 qemu#16 0x000055e74333de6c in qmp_dispatch (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 qemu#17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>) at monitor/qmp.c:145 qemu#18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 qemu#19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117 qemu#20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117 qemu#21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459 qemu#22 0x000055e743385742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 qemu#23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176 qemu#24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829 qemu#25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219 qemu#26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 qemu#27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 qemu#28 0x000055e74316a3c1 in main_loop () at vl.c:1828 qemu#29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by not acquiring the AioContext there, and ensuring all paths leading to it have it already acquired (backup_clean()). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111 Signed-off-by: Sergio Lopez <slp@redhat.com>
All paths that lead to bdrv_backup_top_drop(), except for the call from backup_clean(), imply that the BDS AioContext has already been acquired, so doing it there too can potentially lead to QEMU hanging on AIO_WAIT_WHILE(). An easy way to trigger this situation is by issuing a two actions transaction, with a proper and a bogus blockdev-backup, so the second one will trigger a rollback. This will trigger a hang with an stack trace like this one: #0 0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 qemu#1 0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 qemu#2 0x000055e743386e09 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 qemu#3 0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true) at util/aio-posix.c:669 qemu#4 0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878 qemu#5 0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017 qemu#6 0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262 qemu#7 0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644 qemu#8 0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273 qemu#9 0x000055e74331461f in backup_job_create (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478 qemu#10 0x000055e74315bc52 in do_backup_common (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0) at blockdev.c:3580 qemu#11 0x000055e74315c37c in do_blockdev_backup (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0) at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492 qemu#12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018) at blockdev.c:1885 qemu#13 0x000055e743160152 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340 qemu#14 0x000055e743287ff5 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8) at qapi/qapi-commands-transaction.c:44 qemu#15 0x000055e74333de6c in do_qmp_dispatch (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 qemu#16 0x000055e74333de6c in qmp_dispatch (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 qemu#17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>) at monitor/qmp.c:145 qemu#18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 qemu#19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117 qemu#20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117 qemu#21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459 qemu#22 0x000055e743385742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 qemu#23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176 qemu#24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829 qemu#25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219 qemu#26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 qemu#27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 qemu#28 0x000055e74316a3c1 in main_loop () at vl.c:1828 qemu#29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by not acquiring the AioContext there, and ensuring all paths leading to it have it already acquired (backup_clean()). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Thank you for your interest in the QEMU project. This repository is a read-only mirror of the project's master QEMU welcomes contributions of code (either fixing bugs or adding new |
Compiling with Cocoa support on Mac OS X Leopard resulted in an undefined variable.