-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No instance found with the current cluster host id #7100
Comments
@chinochao do you have any logs you can share here? |
I wonder if you're encountering some version of this: |
I have the same issue. Not the one from the web container, web UI is fine. 2020-08-03 15:16:54,656 WARNING awx.main.dispatch.periodic periodic beat started If I look at the table "main_instance" I saw that I have an instance each time I stop/start the pod. How can I fix that so the Awx will work again ? |
@Seb0042 I still havent found a solution for this. I was forced to stay in version 9.3 for the moment. Hopefully AWX developers can see what the issue is. My logs are same as the one you provided. |
I found a workaround. The original name of the deployment is awx. I was renaming it awx-online for my needs. I tried with the original name and it works again. |
Where did you make this change exactly? settings.py or another file? |
No, I'm talking about the deployment's name in kubernets. My pods were named awx-online-xxxx-yyyy but it seems that for the moment they have to be named awx-xxxx-yyyy. So I reversed the changes I made in the deployment.yml file (I deleted the deployment and then recreated it). |
Cool, I will see if that works for me. My deployment is named awx-web-xxxx-yyyy and awx-task-xxxx-yyyy. |
For anyone else wanting to do this. The hostname needs to match CLUSTER_HOST_ID in /etc/tower/awx_settings.py (default is 'awx') If you have already started everything and it's now broken. Connect to the Postgres DB and change hostname in public.main_instance to whatever you have set the main hostname to which matches the setting above. |
I was running into similar issues today with this. Looking in the code by default the Instance object is supposed to get settings.CLUSTER_HOST_ID if no hostname argument was passed in. But for some reason somewhere along the line when starting up a fresh container the register process is registering the task's container's hostname instead. The reason I needed to change the hostname is because of DNS issues. The task container has the same hostname as the machine's hostname. In a clustered environment the hostnames need to be valid so the websocket communication can happen. However, I have a playbook that needs to connect to each cluster server to do some stuff, but the hostname inside the task engine resolves to the container's IP when it comes time to connect its own machine. In any case because of the issue above I needed to try and change the hostname, but because the registering process registers the task engine hostname in the instances table (and awx_web is trying to connect to this hostname) instead of the cluster_host_id from settings.py. Because of clustering instances the hostname should be coming from settings.py that way you can set the proper hostname for websocket communication to work and the hostname of the container doesn't, and shoudln't, matter. |
ISSUE TYPE
SUMMARY
I am currently running AWX 9.3.0 without issues. I have tried to install 10.0.0 , 11.0.0 , 11.1.0 , 11.2.0, but they are all buggy with the same error message "No instance found with the current cluster host id "
There are many reports of this error without a solid solution. Is there a solution to that error message in AWX versions 10.0.0 , 11.0.0 , 11.1.0 , 11.2.0 .
Does the Redis version matters at this point for the broker deployment?
ENVIRONMENT
The text was updated successfully, but these errors were encountered: