No instance found with the current cluster host id #7100

rchaud · 2020-05-20T15:03:08Z

ISSUE TYPE

Bug Report

SUMMARY

I am currently running AWX 9.3.0 without issues. I have tried to install 10.0.0 , 11.0.0 , 11.1.0 , 11.2.0, but they are all buggy with the same error message "No instance found with the current cluster host id "

There are many reports of this error without a solid solution. Is there a solution to that error message in AWX versions 10.0.0 , 11.0.0 , 11.1.0 , 11.2.0 .

Does the Redis version matters at this point for the broker deployment?

ENVIRONMENT

AWX version: 10.0.0 , 11.0.0 , 11.1.0 , 11.2.0
AWX install method: Kubernetes YAML, using Docker hub images.

ryanpetrello · 2020-05-29T17:28:21Z

@chinochao do you have any logs you can share here?

ryanpetrello · 2020-05-29T17:28:55Z

I wonder if you're encountering some version of this:

#7000

Seb0042 · 2020-08-03T15:27:03Z

I have the same issue. Not the one from the web container, web UI is fine.
But the awx-task doesn't work anymore.
I have the same environnement: Kubernetes YAML, using Docker hub images.
Here is the log from the awx-task container:

2020-08-03 15:16:54,656 WARNING awx.main.dispatch.periodic periodic beat started
Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in
sys.exit(manage())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/init.py", line 154, in manage
execute_from_command_line(sys.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line
utility.execute()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle
reaper.reap()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
(changed, me) = Instance.objects.get_or_register()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 158, in get_or_register
return (False, self.me())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 116, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id

If I look at the table "main_instance" I saw that I have an instance each time I stop/start the pod.
I migrated from 9.3 to 10, then to 11.0 then 11.2 then 12. The issue started to happen with 12.0 version.
I tried to delete old instance with "awx-manage deprovision_instance --hostname=[name of the pods]" but it still not working.

How can I fix that so the Awx will work again ?

rchaud · 2020-08-03T16:06:23Z

@Seb0042 I still havent found a solution for this. I was forced to stay in version 9.3 for the moment. Hopefully AWX developers can see what the issue is. My logs are same as the one you provided.

Seb0042 · 2020-08-04T07:07:23Z

I found a workaround. The original name of the deployment is awx. I was renaming it awx-online for my needs. I tried with the original name and it works again.
I will stay like this until we have a solution.

rchaud · 2020-08-04T15:36:51Z

I found a workaround. The original name of the deployment is awx. I was renaming it awx-online for my needs. I tried with the original name and it works again.
I will stay like this until we have a solution.

Where did you make this change exactly? settings.py or another file?

Seb0042 · 2020-08-04T15:48:23Z

No, I'm talking about the deployment's name in kubernets. My pods were named awx-online-xxxx-yyyy but it seems that for the moment they have to be named awx-xxxx-yyyy. So I reversed the changes I made in the deployment.yml file (I deleted the deployment and then recreated it).

rchaud · 2020-08-04T16:24:48Z

No, I'm talking about the deployment's name in kubernets. My pods were named awx-online-xxxx-yyyy but it seems that for the moment they have to be named awx-xxxx-yyyy. So I reversed the changes I made in the deployment.yml file (I deleted the deployment and then recreated it).

Cool, I will see if that works for me. My deployment is named awx-web-xxxx-yyyy and awx-task-xxxx-yyyy.

timhaak · 2020-10-27T10:46:00Z

For anyone else wanting to do this.

The hostname needs to match CLUSTER_HOST_ID in /etc/tower/awx_settings.py (default is 'awx')

If you have already started everything and it's now broken.

Connect to the Postgres DB and change hostname in public.main_instance to whatever you have set the main hostname to which matches the setting above.

minsis · 2021-02-01T16:59:48Z

I was running into similar issues today with this. Looking in the code by default the Instance object is supposed to get settings.CLUSTER_HOST_ID if no hostname argument was passed in. But for some reason somewhere along the line when starting up a fresh container the register process is registering the task's container's hostname instead.

The reason I needed to change the hostname is because of DNS issues. The task container has the same hostname as the machine's hostname. In a clustered environment the hostnames need to be valid so the websocket communication can happen. However, I have a playbook that needs to connect to each cluster server to do some stuff, but the hostname inside the task engine resolves to the container's IP when it comes time to connect its own machine.

In any case because of the issue above I needed to try and change the hostname, but because the registering process registers the task engine hostname in the instances table (and awx_web is trying to connect to this hostname) instead of the cluster_host_id from settings.py. Because of clustering instances the hostname should be coming from settings.py that way you can set the proper hostname for websocket communication to work and the hostname of the container doesn't, and shoudln't, matter.

awxbot added the type:bug label May 21, 2020

blomquisg added the state:needs_info label May 29, 2020

nicoherbigde mentioned this issue Jul 20, 2020

Task container unreachable from the web container #7404

Closed

jamesnhan mentioned this issue Oct 15, 2020

Update hostnames for awx_task and awx_web in the inventory and settings.py #8406

Closed

shanemcd removed the state:needs_info label Jan 26, 2022

mabashian added the component:api label Apr 5, 2022

rchaud closed this as completed Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No instance found with the current cluster host id #7100

No instance found with the current cluster host id #7100

rchaud commented May 20, 2020

ryanpetrello commented May 29, 2020

ryanpetrello commented May 29, 2020

Seb0042 commented Aug 3, 2020

rchaud commented Aug 3, 2020

Seb0042 commented Aug 4, 2020 •

edited

Loading

rchaud commented Aug 4, 2020

Seb0042 commented Aug 4, 2020

rchaud commented Aug 4, 2020

timhaak commented Oct 27, 2020

minsis commented Feb 1, 2021

No instance found with the current cluster host id #7100

No instance found with the current cluster host id #7100

Comments

rchaud commented May 20, 2020

ISSUE TYPE

SUMMARY

ENVIRONMENT

ryanpetrello commented May 29, 2020

ryanpetrello commented May 29, 2020

Seb0042 commented Aug 3, 2020

rchaud commented Aug 3, 2020

Seb0042 commented Aug 4, 2020 • edited Loading

rchaud commented Aug 4, 2020

Seb0042 commented Aug 4, 2020

rchaud commented Aug 4, 2020

timhaak commented Oct 27, 2020

minsis commented Feb 1, 2021

Seb0042 commented Aug 4, 2020 •

edited

Loading