-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest version of stembuild construct hangs forever #27
Comments
We found if the windows host and stembuild utility in same subset, It can passed with "no route to host" after almost 5 minutes waiting.
|
Yeah, I don't see a change in our nightly builds. Both back before the change (~November) and now have a 4-5 hang in there, but then continue. Our CI worker and the VM we're running stembuild against do both run in the same subnet though. @sneal is the subnet thing possibly why you're experiencing this and we're not? |
Hi, @jpalermo one customer encountered this issue due to they always deploy the stembuild utility and windows vm in different subnet. They also can constructed stemcell successfully before this pr merged. |
@jpalermo You will see different results depending up whether on the same subnet or not. The winrm library we're using now retries on a connection timeout forever and only returns once the command has executed to completion. In our case since we execute a command that shuts the VM down, it'll never complete the command (from the winrm library perspective). Golang will return a different error to the winrm library (timeout vs no route to host) depending upon if stembuild is run in the same or different subnets. While the winrm library should have some sort of timeout and give clients more control over retry behavior, we really just need to execute the final command (sysprep & shutdown) while simultaneously polling the VM for shutdown via some mechanism. Once shutdown is detected we can continue. |
Opened a PR to provide a config flag to disable the new behavior. |
…g forever. Issue: #27 This fork adds an option to roll back the "automatic retry on timeout" behavior that was added to winrm
Add rollback of "automatic retry on timeout" in winrm when executing the post reboot script. This should fix the hang reported during sysprep.
@sneal, is this all fixed now? |
Sounds like we believe it's fixed for now, but feel free anybody to reopen the issue if you still see the problem. |
The recent dependency updates pulled in this winrm client library update which changes winrm command "timeout error" handling.
Old working stembuild logs:
Latest hanging stembuild logs:
What's happening is that "script 2 of 2" calls sysprep at the end and shuts down the VM and causes a winrm connection timeout that the winrm client library retries forever (new behavior) where previously it would get an
unknown error Post "http://10.220.41.242:5985/wsman": dial tcp 10.220.41.242:5985: i/o timeout
and "finish the command" allowing stembuild to proceed.The text was updated successfully, but these errors were encountered: