Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime/cgo: misuse of an invalid Handle (Code-Hex/vz/v3.changeStateOnObserver) #1381

Closed
AkihiroSuda opened this issue Feb 24, 2023 · 20 comments · Fixed by #1494, #1497 or #1525
Closed
Labels
bug Something isn't working component/vz

Comments

@AkihiroSuda
Copy link
Member

$ limactl start template://experimental/vz

$ limactl stop vz
INFO[0000] Sending SIGINT to hostagent process 47742    
INFO[0000] Waiting for the host agent and the driver processes to shut down 
INFO[0000] [hostagent] 2023/02/24 17:59:43 tcpproxy: for incoming conn 127.0.0.1:51498, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host 
INFO[0000] [hostagent] 2023/02/24 17:59:53 tcpproxy: for incoming conn 127.0.0.1:51499, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused 
INFO[0000] [hostagent] Received SIGINT, shutting down the host agent 
INFO[0000] [hostagent] Shutting down the host agent     
INFO[0000] [hostagent] Stopping forwarding "/run/lima-guestagent.sock" (guest) to "/Users/suda/.lima/vz/ga.sock" (host) 
INFO[0000] [hostagent] Shutting down VZ                 
ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE 
INFO[0001] [hostagent] panic: runtime/cgo: misuse of an invalid Handle 
INFO[0001] [hostagent] goroutine 17 [running, locked to thread]: 
INFO[0001] [hostagent] runtime/cgo.Handle.Value(...)    
INFO[0001] [hostagent]  /usr/local/Cellar/go/1.19.6/libexec/src/runtime/cgo/handle.go:124 
INFO[0001] [hostagent] github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0xc000006801?) 
INFO[0001] [hostagent]  /Users/suda/gopath/pkg/mod/github.com/!code-!hex/vz/v3@v3.0.4/virtualization.go:160 +0xec 

version: e939454

@AkihiroSuda AkihiroSuda added bug Something isn't working component/vz labels Feb 24, 2023
@AkihiroSuda
Copy link
Member Author

Looks like this happens only on stopping a new instance

@balajiv113
Copy link
Member

Will check this out

@balajiv113
Copy link
Member

Debugged this, this is from upstream only. Will raise a ticket in vz. I tried to do some workaround but none worked.

@jandubois
Copy link
Member

Looks like this happens only on stopping a new instance

Yes, looks like that for me too. It also hangs for almost another 3 minutes after the panic() before exiting the process. This seems to be inside the vz package, so I don't think we could capture it and exit right away (maybe not worth doing anyways).

After restarting the instance, I can shut it down normally without a panic, but still get an error from DHCP:

ERRO[0001] [hostagent] dhcp: unhandled message type: RELEASE

@balajiv113
Copy link
Member

but still get an error from DHCP:
DHCP error is from gvisor-tap-vsock.

@dmigwi
Copy link

dmigwi commented Mar 4, 2023

I experienced the same error working on a separate product on Mac Os. The crashes also happened severally when I was shutting down the application. After updating the o.s to Monterey the crash seems to have disappeared by I want to understand what exactly happened.
image

@jandubois
Copy link
Member

@dmigwi This issue is about macOS 13 (Ventura) because there is no Virtualization framework support in Lima for earlier macOS version (we only support EFI boot, which was added in Ventura). So if your problem with the vz library was fixed by Monterey, then this is a separate issue.

@dmigwi
Copy link

dmigwi commented Mar 4, 2023

@jandubois Thanks for letting me know that is issue is not yet resolved since Ventura is the latest Mac Os release. A weird thing that happened when I experienced that crash on my product (desktop app) is my mac was weirdly hot and the fan was humming way louder than usual. This led me to believe something in the background or o.s. level was pushing my CPU to its extreme limit.

I tend to believe the golang GC kicked in to clean up memory space it thought wasn't needed anymore since space was kinda being stretched thin then. The communication between golang and cgo is via memory locations inform of integers, so when cgo attempted to resolve the memory location that turned out to reused by something else that crash happened then. Its all documented here: https://golang.design/research/cgo-handle/

My initial hypothesis is that the crash happens when the garbage collector runs and cleans up memory locations its not expected to mess with. I could be wrong on this!

@jandubois
Copy link
Member

@dmigwi Please report your issue on the https://github.com/Code-Hex/vz repo if you have additional information to reproduce the problem.

This repo is purely about Lima, so we will not be able to help you with your project here, since you seem to be experiencing a separate issue.

pendo324 added a commit to runfinch/finch that referenced this issue Apr 12, 2023
Issue #, if available:

*Description of changes:*
- always force stop when using Virtualization.framework, potential
workaround for lima-vm/lima#1381 (needs more
testing)

*Testing done:*
- local testing


- [x] I've reviewed the guidance in CONTRIBUTING.md


#### License Acceptance

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: Justin Alvarez <alvajus@amazon.com>
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Apr 15, 2023

Still doesn't seem fixed (macOS 13.3.1, Intel)

$ limactl stop vz
INFO[0000] Sending SIGINT to hostagent process 33283    
INFO[0000] Waiting for the host agent and the driver processes to shut down 
INFO[0000] [hostagent] 2023/04/15 22:39:32 tcpproxy: for incoming conn 127.0.0.1:60602, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host 
INFO[0000] [hostagent] 2023/04/15 22:39:42 tcpproxy: for incoming conn 127.0.0.1:60612, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused 
INFO[0000] [hostagent] Received SIGINT, shutting down the host agent 
INFO[0000] [hostagent] Shutting down the host agent     
INFO[0000] [hostagent] Stopping forwarding "/run/lima-guestagent.sock" (guest) to "/Users/suda/.lima/vz/ga.sock" (host) 
INFO[0000] [hostagent] Shutting down VZ                 
ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE 
INFO[0001] [hostagent] panic: runtime/cgo: misuse of an invalid Handle 
INFO[0001] [hostagent] goroutine 17 [running, locked to thread]: 
INFO[0001] [hostagent] runtime/cgo.Handle.Value(...)    
INFO[0001] [hostagent]  /usr/local/Cellar/go/1.20.3/libexec/src/runtime/cgo/handle.go:124 
INFO[0001] [hostagent] github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0xc000006801?) 
INFO[0001] [hostagent]  /Users/suda/gopath/pkg/mod/github.com/!code-!hex/vz/v3@v3.0.5/virtualization.go:160 +0xb4 

@AkihiroSuda AkihiroSuda reopened this Apr 15, 2023
@balajiv113
Copy link
Member

@AkihiroSuda
Is the steps to reproduce are the same ??

I tried with multiple create stop and start stop use cases :( Didn't face the issue.

@AkihiroSuda
Copy link
Member Author

@AkihiroSuda Is the steps to reproduce are the same ??

Yes

I tried with multiple create stop and start stop use cases :( Didn't face the issue.

Probably because this is a race condition

@abiosoft
Copy link
Contributor

This is still happening on v0.15.1.

A user reported it abiosoft/colima#691 (comment) and I am also able to reproduce it.

@AkihiroSuda AkihiroSuda reopened this Apr 16, 2023
@joellarsonibotta
Copy link

joellarsonibotta commented Apr 27, 2023

I can confirm that it is still happening for me:

$ colima stop
INFO[0000] stopping colima
INFO[0000] stopping ...                                  context=docker
INFO[0002] stopping ...                                  context=vm
> [hostagent] panic: runtime/cgo: misuse of an invalid Handle
> [hostagent] goroutine 17 [running, locked to thread]:
> [hostagent] runtime/cgo.Handle.Value(...)
> [hostagent] 	/opt/homebrew/Cellar/go/1.20.3/libexec/src/runtime/cgo/handle.go:124
> [hostagent] github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0x14000002680?)
> [hostagent] 	/Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/!code-!hex/vz/v3@v3.0.5/virtualization.go:160 +0xb0

Version Info:

colima version 0.5.4
limactl version 0.15.1

INFO[0000] colima is running using macOS Virtualization.Framework
INFO[0000] arch: aarch64
INFO[0000] runtime: docker
INFO[0000] mountType: virtiofs
INFO[0000] socket: unix:///$HOME/.colima/default/docker.sock

I do see that there's a problem identified in vz with a solution coming soon: Code-Hex/vz#124

@chenzhekl
Copy link

chenzhekl commented Jan 15, 2025

This issue still happened to me. To reproduce it,

limactl start template://apptainer
limactl stop apptainer

I am on 1.0.3

@AkihiroSuda
Copy link
Member Author

This issue still happened to me. To reproduce it,

limactl start template://apptainer limactl stop apptainer

I am on 1.0.3

Could you paste the full logs and provide the host OS info? With VZ or qemu?

@AkihiroSuda AkihiroSuda reopened this Jan 15, 2025
@chenzhekl
Copy link

Sure, the following is the log when starting and stopping the VM

xxx@FFFFFFFF ~ % limactl start apptainer
INFO[0000] Using the existing instance "apptainer"
INFO[0000] Starting the instance "apptainer" with VM driver "vz"
INFO[0000] [hostagent] hostagent socket created at /Users/xxx/.lima/apptainer/ha.sock
INFO[0000] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/xxx/.lima/apptainer/serial*.log")
INFO[0000] SSH Local Port: 53833
INFO[0000] [hostagent] [VZ] - vm state change: running
INFO[0000] [hostagent] Waiting for the essential requirement 1 of 2: "ssh"
INFO[0010] [hostagent] Waiting for the essential requirement 1 of 2: "ssh"
INFO[0010] [hostagent] The essential requirement 1 of 2 is satisfied
INFO[0010] [hostagent] Waiting for the essential requirement 2 of 2: "user session is ready for ssh"
INFO[0010] [hostagent] The essential requirement 2 of 2 is satisfied
INFO[0010] [hostagent] Waiting for the optional requirement 1 of 1: "user probe 1/1"
INFO[0010] [hostagent] Guest agent is running
INFO[0010] [hostagent] Not forwarding TCP 127.0.0.53:53
INFO[0010] [hostagent] Not forwarding TCP 127.0.0.54:53
INFO[0010] [hostagent] Not forwarding TCP [::]:22
INFO[0010] [hostagent] The optional requirement 1 of 1 is satisfied
INFO[0010] [hostagent] Waiting for the guest agent to be running
INFO[0010] [hostagent] Waiting for the final requirement 1 of 1: "boot scripts must have finished"
INFO[0010] [hostagent] The final requirement 1 of 1 is satisfied
INFO[0011] READY. Run `limactl shell apptainer` to open the shell.

xxx@FFFFFFFF ~ % limactl stop apptainer
INFO[0000] Sending SIGINT to hostagent process 46099
INFO[0000] Waiting for the host agent and the driver processes to shut down
INFO[0000] [hostagent] Received SIGINT, shutting down the host agent
INFO[0000] [hostagent] Shutting down the host agent
INFO[0000] [hostagent] Shutting down VZ
ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE
INFO[0000] [hostagent] [VZ] - vm state change: stopped
ERRO[0000] [hostagent] accept tcp 127.0.0.1:53833: use of closed network connection

I am running lima 1.0.3 on macOS Sonoma 14.7.2 with VZ.

@AkihiroSuda
Copy link
Member Author

What is the issue? The log seems quite usual and doesn't seem panicking.

"use of closed network connection" error is usual on shutting down and is negligible.

@chenzhekl
Copy link

Seeing error messages like this

ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE

is unusual to me, as it indicates something went wrong. Perhaps it's better to make such message clearer if it's expected and negligible.

@AkihiroSuda
Copy link
Member Author

Right, but completely irrelevant to the OP.
Should be discussed in another issue ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/vz
Projects
None yet
7 participants