Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: The startup is too insensitive to network availability #10816

Closed
3 tasks done
id-jim-bach opened this issue Aug 4, 2021 · 2 comments
Closed
3 tasks done

Bug: The startup is too insensitive to network availability #10816

id-jim-bach opened this issue Aug 4, 2021 · 2 comments

Comments

@id-jim-bach
Copy link

id-jim-bach commented Aug 4, 2021

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I am not entitled to status updates or other assurances.

Summary

If you have something like a service mesh that takes network just a moment longer setting up a proxy to be ready to reach postgres the launch_awx.sh fails to perform awx-manage collectstatic --noinput --clear . Thus the awx-manage fails to do things like 186 static files copied to '/var/lib/awx/public/static'. every time and continue anyways. The bash script does not fail either since it doesnt have something like set -e and will proceed to run giving a false sense of working. I've made a downstream workaround for myself until a fix goes in but if we could have the awx-manage be more forgiving or hacky hack the bash in a more clever way upstream that would be awesome! Thanks

# my hack: please dont do this unless its temporary
# yum added telnet to the docker file
# the service mesh makes it take slightly longer and the subsequent awx-manage command will fail
while [ $(echo 'exit' | telnet awx-postgres.ansible.svc.cluster.local 5432 2>&1 | grep -c "Connected to") -eq 0 ]; do
    echo "waiting on postgres"
    sleep 5
done

# without above this will fail exit non-zero
awx-manage collectstatic --noinput --clear

# since the above failed and there is no set -e this will proceed and the awx-web is in a degraded state.
supervisord -c /etc/supervisord.conf

AWX version

19.2.2 and most of them

Installation method

kubernetes

Modifications

yes

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

Delay the postgres network or use a service mesh like linkerd/istio

Expected results

Retry until succeeds

Actual results

Fails and continues

Additional information

My modification fixes the problem

@wenottingham
Copy link
Contributor

This may be fixed by #10583

@id-jim-bach
Copy link
Author

id-jim-bach commented Aug 11, 2021

@wenottingham patched it with this downstream (until a new release) it does indeed fix this and then some. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants