Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register system again if deleted by another pod #12494

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

AlanCoding
Copy link
Member

@AlanCoding AlanCoding commented Jul 8, 2022

SUMMARY

Solution for #12471

As of opening this, I can confirm it works, but causes churn in some other services.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

self.instance_name = Instance.objects.me().hostname
except Exception as e:
self.instance_name = settings.CLUSTER_HOST_ID
logger.info(f'Instance {self.instance_name} seems to be unregistered, error: {e}')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fosterseth help me out here please. This subsystem metrics code is front-running a million other scenarios by throwing an exception before we get deeper into the service details. For an example, we have this log which is beautifully constructed:

tools_awx_1 | 2022-07-08 17:40:57,095 INFO     [-] awx.main.wsbroadcast Unable to return currently active instance: No instance found with the current cluster host id, retry in 5s...
tools_awx_1 | make[1]: Leaving directory '/awx_devel'
tools_awx_1 | 2022-07-08 17:41:02,473 INFO exited: awx-wsbroadcast (exit status 0; expected)
tools_awx_1 | 2022-07-08 17:41:03,477 INFO spawned: 'awx-wsbroadcast' with pid 27242
tools_awx_1 | make[1]: Entering directory '/awx_devel'
tools_awx_1 | 2022-07-08 17:41:04,498 INFO success: awx-wsbroadcast entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
tools_awx_1 | 2022-07-08 17:41:05,510 INFO     [-] awx.main.wsbroadcast Unable to return currently active instance: No instance found with the current cluster host id, retry in 5s...
tools_awx_1 | make[1]: Leaving directory '/awx_devel'
tools_awx_1 | 2022-07-08 17:41:10,892 INFO exited: awx-wsbroadcast (exit status 0; expected)
tools_awx_1 | 2022-07-08 17:41:10,894 INFO spawned: 'awx-wsbroadcast' with pid 27275
tools_awx_1 | make[1]: Entering directory '/awx_devel'
tools_awx_1 | 2022-07-08 17:41:11,903 INFO success: awx-wsbroadcast entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
tools_awx_1 | 2022-07-08 17:41:12,996 INFO     [-] awx.main.wsbroadcast Unable to return currently active instance: No instance found with the current cluster host id, retry in 5s...

I can get this if I suppress errors from the code here, as I'm trying to do here.

But this begs the obvious question, why do this at all? Why did we not start out referencing settings.CLUSTER_HOST_ID? We don't need the model, just the name. Did we get ourselves into this situation by cargo-culting the Instance.objects.me() call?

Avoid cases where missing instance
  would throw error on startup
  this gives time for heartbeat to register it
@AlanCoding AlanCoding merged commit 9e8ba6c into ansible:devel Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants