Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BCDA-3710 - Resolve issues with stale db connections #600

Merged
merged 15 commits into from
Dec 16, 2020

Conversation

mtrang1263
Copy link
Contributor

Fixes BCDA-3710

Previously, our driver dependencies (pg [v1.0.0] and pgx [v3.2.0]) had issues with re-using stale connections. For example, if our database instances were restarted, we would receive invalid/dead connection errors.

Issues:

  1. pg issue: db operation fails with "broken pipe" instead of reconnecting transparently after server restart lib/pq#870
  2. pgx issue: reduce unecessary memory allocation jackc/pgx#49

We cannot upgrade our pgx driver to v4 because of an incompatibility with our que-go dependency. que-go relies on pgx v3 with no plans to move up to pgx v4.

The pg driver issue was resolved in this PR.

Change Details

  1. Upgrade pg driver from v1.0.0 to v1.9.0
  2. Add connection pooling goroutine that checks and verifies the connections in our pools. Implementation was based off of the pgxv4 fix.

No changes are needed to the pgx driver used in bcdaworker. Since we instantiate a quego worker, there's goroutine(s) that run that verify and refresh the connection pool. See.

Security Implications

  • new software dependencies
    Upgraded our pg client from v1.0.0 to v1.9.0

  • security controls or supporting software altered

  • new data stored or transmitted

  • security checklist is completed for this change

  • requires more information or team discussion to evaluate security implications

  • no PHI/PII is affected by this change

Acceptance Validation

  1. Verified the stale connection issue was resolved locally by running the following process.
make load-fixtures
docker-compose restart db queue
make smoke-test

When running this, we would encounter failures because of the invalid connections:

{"file":"/go/src/github.com/CMSgov/bcda-app/bcda/api/requests.go:293","func":"github.com/CMSgov/bcda-app/bcda/api.(*Handler).bulkRequest","level":"error","msg":"conn is dead","time":"2020-12-11T13:27:49-05:00"}
  1. Ran multiple smoke tests jobs after restarting the dev database to verify that connections are automatically refreshed.

]
pruneopts = "UT"
revision = "90697d60dd844d5ef6ff15135d0203f65d2f53b8"
revision = "4604d39ddc9f62e2ca152114c73aec99b77a3468"
version = "v1.9.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes issue related to invalid connections when using the pg driver.

# Since we've rebuilt the databases, we need to restart the worker and api
# to ensure it picks up connections to the new db instance.
# TODO (BCDA-3710) we should be able to remove this line once we have connection pooling on the worker
# Ensure components are started as expected
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed to keep this restart here because of issues uncovered in our CI process. It's caused by the database being dropped while the app is initializing.

Removing the TODO since the restart is not related to the restarting of the db instance.

SMLuthi
SMLuthi previously approved these changes Dec 16, 2020
Copy link
Contributor

@SMLuthi SMLuthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Not super comfortable with go routines but this all makes sense to me.

PatKelsh
PatKelsh previously approved these changes Dec 16, 2020
Copy link
Contributor

@PatKelsh PatKelsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Comment on lines +68 to +72
// With que-go locked to pgx v3, we need a mechanism that will allow us to
// discard bad connections in the pgxpool (see: https://github.com/jackc/pgx/issues/494)
// This implementation is based off of the "fix" that is present in v4
// (see: https://github.com/jackc/pgx/blob/v4.10.0/pgxpool/pool.go#L333)
// Use the same approach to validate the connection associated with the gorm client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to link to resources explaining the implementation here

@mtrang1263 mtrang1263 dismissed stale reviews from PatKelsh and SMLuthi via 53c0e9c December 16, 2020 20:29
Copy link
Contributor

@SMLuthi SMLuthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapproving!

@mtrang1263 mtrang1263 merged commit adc4a2f into master Dec 16, 2020
@mtrang1263 mtrang1263 deleted the martin/BCDA-3710_pgx branch December 16, 2020 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants