Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait for database connection #10583

Merged
merged 4 commits into from
Jul 22, 2021
Merged

wait for database connection #10583

merged 4 commits into from
Jul 22, 2021

Conversation

coolbry95
Copy link
Contributor

@coolbry95 coolbry95 commented Jul 3, 2021

SUMMARY

Wait for the database connection because in some environments the connection cannot be made right away.

#10527

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME
  • API
  • UI
AWX VERSION
awx: 19.2.2
ADDITIONAL INFORMATION

We could also stop after so many tries. Not sure what the point in continuing is if you can't use the database though.

I also tried just skipping past this check if the database was not available but it would fail later on just running collectstatic so the database is needed even for collectstatic.

Before

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection refused
	Is the server running on host "awx-postgres" (10.140.177.141) and accepting
	TCP/IP connections on port 5432?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/__init__.py", line 155, in manage
    if (connection.pg_version // 10000) < 12:
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/__init__.py", line 28, in __getattr__
    return getattr(connections[DEFAULT_DB_ALIAS], item)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/utils/functional.py", line 80, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 282, in pg_version
    with self.temporary_connection():
  File "/usr/lib64/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 593, in temporary_connection
    with self.cursor() as cursor:
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 256, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 233, in _cursor
    self.ensure_connection()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: Connection refused
	Is the server running on host "awx-postgres" (10.140.177.141) and accepting
	TCP/IP connections on port 5432?

After

Wainting for connection to database
No connection available
No connection available
No connection available
No connection available
No connection available
No connection available
Connection established

183 static files copied to '/var/lib/awx/public/static'.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

Copy link
Member

@shanemcd shanemcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @coolbry95 - thank you very much for tracking this down and proposing a patch. This isn't quite the way I'd like to handle this, but it helps a lot.

Could you please try adding this line: https://github.com/ansible/awx/blob/devel/tools/ansible/roles/dockerfile/files/launch_awx_task.sh#L16

To somewhere around here: https://github.com/ansible/awx/blob/devel/tools/ansible/roles/dockerfile/files/launch_awx.sh#L15

I believe this should also resolve your issue.

@coolbry95
Copy link
Contributor Author

That did not work.

[wait-for-migrations] Waiting for database migrations...
[wait-for-migrations] Attempt 1 of 30
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection refused
	Is the server running on host "awx-postgres" (10.140.179.62) and accepting
	TCP/IP connections on port 5432?

@coolbry95
Copy link
Contributor Author

It looks like this command maybe isn't waiting until it can connect the the database?

There are more exceptions that get thrown besides the first one I have in my previous comment.

@coolbry95
Copy link
Contributor Author

Oh maybe the wait for migrations is failing because of the same init code. I am trying to debug what is happening here.

[wait-for-migrations] Waiting for database migrations...
[wait-for-migrations] Attempt 1 of 30
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection refused
	Is the server running on host "awx-postgres" (10.140.179.62) and accepting
	TCP/IP connections on port 5432?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/__init__.py", line 170, in manage
    if (connection.pg_version // 10000) < 12:
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/__init__.py", line 28, in __getattr__
    return getattr(connections[DEFAULT_DB_ALIAS], item)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/utils/functional.py", line 80, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 282, in pg_version
    with self.temporary_connection():
  File "/usr/lib64/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 593, in temporary_connection
    with self.cursor() as cursor:
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 256, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 233, in _cursor
    self.ensure_connection()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: Connection refused
	Is the server running on host "awx-postgres" (10.140.179.62) and accepting
	TCP/IP connections on port 5432?

@coolbry95 coolbry95 requested a review from shanemcd July 6, 2021 15:21
@coolbry95
Copy link
Contributor Author

coolbry95 commented Jul 8, 2021

Ok the wait for migrations doesn't work because the logic in it is wrong.

timeout "${TIMEOUT}" \
                 /bin/bash -c "! awx-manage showmigrations | grep '\[ \]'"  \
               && return || rc=$?

The exit code is always 0 for this thus it always immediately returns. If you change the return to an echo it works properly.

Any thoughts on how you want to fix this? We could catch the exception still and then exit with a different code. I will update the PR to reflect that.

@coolbry95
Copy link
Contributor Author

After more testing I know what is happening.

wait-for-migrations returns because grep does not find any "[ ]" because awx-manage showmigrations is throwing an exception because it can't reach the database. Either there needs to be a different validation check that all migrations ran or we need to wait and try for a database connection which my PR is currently doing.

I am not sure how to change the logic for showmigrations. Any thoughts @shanemcd?

I apologize for the spam I was just trying to test using an unmanaged database with the awx-operator and started testing some more.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@coolbry95
Copy link
Contributor Author

With the latest commit both awx-task and awx-web work correctly when deployed with a fresh unmnaged database in the awx-operator.

@shanemcd
Copy link
Member

Hi @coolbry95 - I'd like to fix up the wait-for-migrations script instead. If the database connection does not come available for that step (after however many attempts), the container startup should crash instead of trying forever.

@coolbry95
Copy link
Contributor Author

Ok how do you propose fixing wait-for-migrations? If it can't connect then it throws an exception and there is not any migration output that it is currently grepping for.

We could save the output to a variable then check for the word exception or the like and try again if we see that? Would we also add a check in the launch script to check for a proper exit code from wait-for-migrations then fail if the exit code is bad?

If you are on irc we can chat there if possible.

@coolbry95
Copy link
Contributor Author

I took a stab at this. Not sure on how else to do it.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@coolbry95 coolbry95 force-pushed the waitfordb branch 2 times, most recently from b24576c to 38d2e93 Compare July 20, 2021 19:56
@coolbry95
Copy link
Contributor Author

@shanemcd much smaller change. The logic is correct now. I have tested this in the same cluster the other issues arose in. I am going to do a few more tests.

Hopefully you will have time to review and merge before the next release.

coolbry95 and others added 4 commits July 22, 2021 13:56
Signed-off-by: coolbry95 <coolbry95@gmail.com>
…check-postgres-version-4940

introduced a pre-flight check for postgres 12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants