-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cdc): Prepare the self hosted environment for the Change Data Capture pipeline #938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a few comments to verify certain things. Otherwise I think we can merge this.
docker-compose.yml
Outdated
- type: bind | ||
read_only: true | ||
source: ./postgres/init_hba.sh | ||
target: /docker-entrypoint-initdb.d/init_hba.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this script in place, do we still need line 83 above?
POSTGRES_HOST_AUTH_METHOD: "trust"
Also, since we now control the entrypoint, does it have to live under this specific directory? Because if not, then why not just mount ./postgres
under /opt/sentry
and use everything from there? This would reduce the number of bind mounts to 1 and would avoid file-mounts which we had issues with sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need POSTGRES_HOST_AUTH_METHOD: "trust"
. That is not about replication while the changes to pg_hba are only about the replication connections.
POSTGRES_HOST_AUTH_METHOD changes the standard connections we use to run queries.
install/install-wal2json.sh
Outdated
|
||
if [[ $WAL2JSON_VERSION == "latest" ]]; then | ||
VERSION=$( | ||
wget "https://api.github.com/repos/getsentry/wal2json/releases/latest" -O - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't relied on the availability of wget
so far (with the exception of our experimental --minimize-downtime
flag). We haven't received any complaints about that but just wanted to double check on whether this is a safe assumption to make or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What bout curl ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same. The only assumptions we make are bash
, docker
, docker-compose
, sed
, awk
, and grep
. I think we can find/use a curl
or wget
docker image to be safe :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I don't didn't have wget
. 😊
▶ Downloading and installing wal2json ...
install-wal2json.sh: line 13: wget: command not found
An error occurred, caught SIGERR on line 20
Cleaning up...
$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like macOS does ship with curl
. If we can't find a Docker image easily I think curl would be a safe choice. Do you know of any Linux that doesn't have it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now relying on curlimages/curl
install/install-wal2json.sh
Outdated
|
||
if [[ $WAL2JSON_VERSION == "latest" ]]; then | ||
VERSION=$( | ||
wget "https://api.github.com/repos/getsentry/wal2json/releases/latest" -O - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same. The only assumptions we make are bash
, docker
, docker-compose
, sed
, awk
, and grep
. I think we can find/use a curl
or wget
docker image to be safe :)
install/install-wal2json.sh
Outdated
|
||
if [[ $WAL2JSON_VERSION == "latest" ]]; then | ||
VERSION=$( | ||
wget "https://api.github.com/repos/getsentry/wal2json/releases/latest" -O - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I don't didn't have wget
. 😊
▶ Downloading and installing wal2json ...
install-wal2json.sh: line 13: wget: command not found
An error occurred, caught SIGERR on line 20
Cleaning up...
$
install/install-wal2json.sh
Outdated
|
||
if [[ $WAL2JSON_VERSION == "latest" ]]; then | ||
VERSION=$( | ||
wget "https://api.github.com/repos/getsentry/wal2json/releases/latest" -O - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like macOS does ship with curl
. If we can't find a Docker image easily I think curl would be a safe choice. Do you know of any Linux that doesn't have it?
Ok, we have a problem. On postgres-alpine
On postgres
So questions @BYK :
I doubt linking libc statically would be a good idea, I don't even think it is possible as the build process is orchestrated by postgres. |
Probably someone chose "the lighter" version on dev and we stuck with it. For self-hosted, we tend to use the most common distributions, which is not alpine :)
That sounds like a step backwards and would introduce a lot of complexity that should live inside the This means now
Thoughts? |
Yes using postgres standard is supposed to work. I did not test it, but there should be no reason for that. Sure I can deliver two binaries in the release, I was not convinced distributing multiple binaries per required implementation of libc would be a common practice. But that may be just because I never noticed it. |
Building two binaries is possible, and it works correctly when importing the right one here. |
This is an RFC on how to set up Change Data Capture.
We will use Change Data Capture to stream WAL updates from postgres into clickhouse so that features like issue search will be able to join event data and metadata (from postgres) through Snuba.
This requires the followings:
This PR is preparing postgres to stream updates via the replication log.
The idea is to
There is a difference between how this is set up and how we do the same in the development environment.
In the development environment we download the library from the entrypoint itself and store it in a persistent volume, so we do not have to download it every time.
Unfortunately this does not work here as the postgres image is
postgres:9.6
while it ispostgres:9.6-alpine
. This one does not come with either wget or curl. I don't think installing that in the entrypoint would be a good idea, so the download happens in install.sh. I actually think this way is safer so we never depend on connectivity for postgres to start properly.