Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [v21.11.0] Disable Flag gateway-failover not working correctly #173

Closed
vvignaux opened this issue Mar 29, 2022 · 10 comments
Closed

[Bug] [v21.11.0] Disable Flag gateway-failover not working correctly #173

vvignaux opened this issue Mar 29, 2022 · 10 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@vvignaux
Copy link

vvignaux commented Mar 29, 2022

version: v21.11.0

When trying to create a cluster with the following command:

safescale cluster create callisto --complexity=Normal --flavor=K8S --gw-sizing="template=d2-2" --master-sizing="template=b2-7" --node-sizing="template=b2-30" --disable gateway-failover

The creation fails with the following message:

ERROR [2022-03-29 11:36:47.170][error][ 4006] We have a primary gateway with name gw-callisto and id 19bac927-a53c-4391-8844-c5af6a09629b and a secondary gateway with name gw-callisto and id 19bac927-a53c-4391-8844-c5af6a09629b

The relevant logs section:
safescale.log

@vvignaux vvignaux changed the title Disable Flag gateway-failover not working correctly [Bug] [v21.11.0] Disable Flag gateway-failover not working correctly Mar 29, 2022
@oscarpicas oscarpicas self-assigned this Mar 29, 2022
@oscarpicas
Copy link
Collaborator

It's not a bug.
It's a "feature" of urfave/v2 (the lib used by SafeScale to process user input)

The name of the cluster MUST be the LAST parameter of the command line
you are writing this command:

safescale cluster create callisto --complexity=Normal --flavor=K8S --gw-sizing="template=d2-2" --mastersizing="template=b2-7" --node-sizing="template=b2-30" --disable gateway-failover

but safescaled SEES this command (it stops reading after the cluster name):

safescale cluster create callisto

if you want to take into account all parameters you should write (notice the position of the cluster's name):

safescale cluster create --complexity=Normal --flavor=K8S --gw-sizing="template=d2-2" --mastersizing="template=b2-7" --node-sizing="template=b2-30" --disable gateway-failover callisto

future releases of SafeScale will issue a warning about this (saying that running the command without modifications such and such parameters are IGNORED)

@oscarpicas oscarpicas added the invalid This doesn't seem right label Mar 29, 2022
@oscarpicas oscarpicas added this to the develop milestone Mar 29, 2022
@vvignaux
Copy link
Author

vvignaux commented Mar 29, 2022

I was not aware that the order is significant and I recreated the command incorrectly when I created the ticket. However, the suggested command was the one already executed which still suffer the mentioned issue.

ubuntu@test-safescale:~$ ./safescale cluster create --complexity=Normal --flavor=K8S --gw-sizing="template=d2-2" --master-sizing="template=b2-7" --node-sizing="template=b2-30" --disable gateway-failover callisto

INFO [2022-03-29 14:44:30.227][info][ 9724] Checking configuration
INFO [2022-03-29 14:44:30.229][info][ 9724] Using SAFESCALE_EXECUTION_TIMEOUT=600
INFO [2022-03-29 14:44:30.229][info][ 9724] Starting server, listening on 'localhost:50051', using metadata suffix ''
INFO [2022-03-29 14:44:30.231][info][ 9724] Registering services
Safescaled version: 21.11.0, build 786276b (2022/03/29-08:10)
Ready to serve on 'localhost:50051' :-)
INFO [2022-03-29 14:44:33.408][info][ 9724] No tenant set yet, but found only one tenant in configuration; setting it as current.
INFO [2022-03-29 14:44:42.068][info][ 9724] Setting default Tenant to 'callisto-onda@GRA5'; storing metadata in bucket '0.safescale-bb37faa99826a9a672ec9ede2149995d'
INFO [2022-03-29 14:44:42.587][info][ 9724] Starting creation of infrastructure of Cluster 'callisto'...
INFO [2022-03-29 14:44:45.646][info][ 9724] Selected image: 'Ubuntu 20.04' (ID='a6bbf13f-8272-49ca-aede-6ec743884e7f')
WARNING [2022-03-29 14:44:45.804][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:44:45.952][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:44:46.117][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
INFO [2022-03-29 14:44:58.234][info][ 9724] Security Group 'safescale-sg_subnet_gateways.callisto.callisto' created successfully
INFO [2022-03-29 14:45:06.505][info][ 9724] Security Group 'safescale-sg_subnet_publicip.callisto.callisto' created successfully
INFO [2022-03-29 14:45:15.354][info][ 9724] Security Group 'safescale-sg_subnet_internals.callisto.callisto' created successfully
WARNING [2022-03-29 14:45:40.109][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:45:40.258][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
INFO [2022-03-29 14:45:40.258][info][ 9724] Requesting the creation of gateway 'gw-callisto' using template ID '6b10e040-ddb9-4738-8088-d38ec542cc2c', template name '', with image ID 'a6bbf13f-8272-49ca-aede-6ec743884e7f'
WARNING [2022-03-29 14:45:41.175][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:45:41.323][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
INFO [2022-03-29 14:46:11.983][info][ 9724] Host resource 'gw-callisto' created successfully
INFO [2022-03-29 14:46:25.471][info][ 9724] Compute resource 'gw-callisto' created
INFO [2022-03-29 14:46:25.472][info][ 9724] Waiting SSH availability on Host 'gw-callisto' ...
INFO [2022-03-29 14:46:25.472][info][ 9724] Waiting install phase init on 'gw-callisto'...
INFO [2022-03-29 14:46:37.819][info][ 9724] Finish Waiting install phase init on 'gw-callisto'... (elapsed: 00h00m12.346s)
INFO [2022-03-29 14:46:38.983][info][ 9724] Starting install phase netsec on 'gw-callisto'...
INFO [2022-03-29 14:48:34.147][info][ 9724] Ending phase netsec on 'gw-callisto'... (elapsed: 00h01m55.163s)
INFO [2022-03-29 14:48:35.874][info][ 9724] Waiting install phase netsec on 'gw-callisto'...
INFO [2022-03-29 14:48:37.077][info][ 9724] Finish Waiting install phase netsec on 'gw-callisto'... (elapsed: 00h00m01.203s)
INFO [2022-03-29 14:48:37.080][info][ 9724] finalizing Host provisioning of 'gw-callisto': rebooting
INFO [2022-03-29 14:48:37.085][info][ 9724] Host 'gw-callisto': rebooting
INFO [2022-03-29 14:49:12.800][info][ 9724] Waiting install phase netsec on 'gw-callisto'...
INFO [2022-03-29 14:49:32.452][info][ 9724] Finish Waiting install phase netsec on 'gw-callisto'... (elapsed: 00h00m19.651s)
INFO [2022-03-29 14:49:36.513][info][ 9724] Host 'gw-callisto' created successfully
INFO [2022-03-29 14:49:45.124][info][ 9724] Starting final configuration phases on the gateway 'gw-callisto'...
INFO [2022-03-29 14:49:45.129][info][ 9724] Starting install phase final on 'gw-callisto'...
INFO [2022-03-29 14:51:08.319][info][ 9724] Ending phase final on 'gw-callisto'... (elapsed: 00h01m23.189s)
INFO [2022-03-29 14:51:08.320][info][ 9724] Waiting install phase final on 'gw-callisto'...
INFO [2022-03-29 14:51:09.582][info][ 9724] Finish Waiting install phase final on 'gw-callisto'... (elapsed: 00h00m01.261s)
INFO [2022-03-29 14:51:09.582][info][ 9724] Ending final configuration phases on the gateway 'gw-callisto' (elapsed: 00h01m24.457s)
ERROR [2022-03-29 14:51:20.821][error][ 9724] We have a primary gateway with name gw-callisto and id 3723d25f-ea51-458b-ac17-b33975d19a71 and a secondary gateway with name gw-callisto and id 3723d25f-ea51-458b-ac17-b33975d19a71
INFO [2022-03-29 14:51:20.821][info][ 9724] Waiting install phase final on 'gw-callisto'...
INFO [2022-03-29 14:51:21.729][info][ 9724] Finish Waiting install phase final on 'gw-callisto'... (elapsed: 00h00m00.907s)
INFO [2022-03-29 14:51:21.730][info][ 9724] Waiting install phase final on 'gw-callisto'...
INFO [2022-03-29 14:51:22.437][info][ 9724] Finish Waiting install phase final on 'gw-callisto'... (elapsed: 00h00m00.706s)
INFO [2022-03-29 14:51:23.408][info][ 9724] Starting addition of Feature 'docker' on Host 'gw-callisto'...
WARNING [2022-03-29 14:51:30.341][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:51:30.500][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:51:35.480][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:51:35.646][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:51:38.779][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
WARNING [2022-03-29 14:51:38.927][warning][ 9724] Unable to request OVH API, flavors availability will not be checked: AlternateApiApplicationKey is not set (mandatory to access native OVH API)
INFO [2022-03-29 14:51:59.146][info][ 9724] Host resource 'callisto-node-2' created successfully
INFO [2022-03-29 14:52:03.525][info][ 9724] Host resource 'callisto-node-1' created successfully
stuck here

@oscarpicas
Copy link
Collaborator

If the problem is the log message "We have a primary gateway etc. etc." it could have been a Warning message too, we just wanted to make sure this kind of issue was displayed, it doesn't stop the cluster creation process.

But, if the problem is the "stuck here", and by that you mean "stuck there forever", then yes, it must NOT happen, and it's some kind of deadlock; if you confirm it's the latter we will retag the issue as a "bug" and will look into it

@oscarpicas oscarpicas added question Further information is requested and removed invalid This doesn't seem right labels Mar 29, 2022
@vvignaux
Copy link
Author

Yes it's the later, indeed probably some kind of deadlock, it stays like this forever and even other basic commands like cluster list won't work anymore once it reach this state

@oscarpicas oscarpicas added bug Something isn't working and removed question Further information is requested labels Mar 29, 2022
@oscarpicas
Copy link
Collaborator

Please, try again, this time with the following command (forcing a cidr):

./safescale cluster create --complexity=Normal --cidr "192.168.27.0/24" --flavor=K8S --gw-sizing="template=d2-2" --master-sizing="template=b2-7" --node-sizing="template=b2-30" --disable gateway-failover callisto

and let us know if it works.

@oscarpicas oscarpicas added the question Further information is requested label Mar 29, 2022
@oscarpicas
Copy link
Collaborator

oscarpicas commented Mar 29, 2022

It seems commit 384f482 introduced a severe regression, develop branch has several deadlocks right now.

This will be fixed next week.
In the meantime you should use branch release/v21.11, using this branch previous cluster creation command (callisto and forcing a cidr) works.

In order to build release/v21.11 binaries:
git clone https://github.com/CS-SI/SafeScale --depth 1 -b release/v21.11
cd SafeScale
./create-local-docker.sh

and the binaries should be in the ./exported directory

@oscarpicas oscarpicas removed the question Further information is requested label Mar 29, 2022
@vvignaux
Copy link
Author

vvignaux commented Mar 30, 2022

Forcing the CIDR sadly didn't improve the outcome, with a deadlock still being present. Thanks for the recommendation, but I also need the corrections from #149. It works without this flag but I still encounter timeouts issues on the cluster creation process. I will try without it and increasing the timeouts for the time being. Thank you a lot for your help

@oscarpicas
Copy link
Collaborator

Corrections from #149 are already on release/v21.11; recreating the binaries with branch release/v21.11 works with the following command (tested yesterday), and uses the right templates too:
safescale cluster create --complexity=Normal --cidr "192.168.27.0/24" --flavor=K8S --gw-sizing="template=d2-2" --master-sizing="template=b2-7" --node-sizing="template=b2-30" --disable gateway-failover callisto

@vvignaux
Copy link
Author

In the end I managed to have the cluster creation successful trying both the release and the latest develop after trying various timeout modification, and both seems to be working. Thanks again

@vplanche-pro
Copy link
Collaborator

Fixed in develop. Will be included in upcoming release v21.11.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants