synq-dbt is a command line tool that executes dbt and uploads dbt artifacts to SYNQ.
Note: synq-dbt is intended to be used for dbt running on Airflow or similar. If you're a dbt Cloud customer, you can integrate you account within SYNQ by going to Settings -> Integrations -> Add Integration -> dbt Cloud.
synq-dbt
wraps dbt
command. After execution of dbt
it collects dbt artifacts that allow SYNQ to understand the structure and status of your dbt project. We collect the following:
manifest.json
— to understand the structure of the dbt projectrun_results.json
— to understand executions statuscatalog.json
— to infer the complete schema of underlying data warehouse tablessources.json
— to capture dbt source freshness
To control the location of artifacts, you can use the SYNQ_TARGET_DIR
environment variable. By default, synq-dbt
looks for target/
directory in the current working directory.
All the data is presented in the SYNQ.
synq-dbt
is dbt version agnostic and works with the version of dbt you have installed on your system. It runs in the following steps:
- Execute your locally installed
dbt
. Arguments you supply tosynq-dbt
are passed todbt
. For example, your current commanddbt run --select finance --threads 5
becomessynq-dbt run --select finance --threads 5
ordbt test --select reports
becomessynq-dbt test --select reports
. - Stores the exit code of the dbt command.
- Reads environment variable
SYNQ_TOKEN
. - Uploads
manifest.json
,run_results.json
,catalog.json
andschema.json
from./target
directory to SYNQ. - Returns stored dbt's exit code. synq-dbt ignores its own errors and always exists with error code of dbt subcommand.
It is possible to upload artifacts that have already been generated. In that case, you can use synq-dbt synq_upload_artifacts
command to upload artifacts to SYNQ.
export SYNQ_VERSION=v1.7.0
wget -O ./synq-dbt https://github.com/getsynq/synq-dbt/releases/download/${SYNQ_VERSION}/synq-dbt-amd64-linux
chmod +x ./synq-dbt
export SYNQ_TOKEN=<your-token>
./synq-dbt synq_upload_artifacts
It is possible to include in the uploaded request logs of dbt execution, to do that you need to generate dbt logs to a file and then point synq-dbt to that file.
dbt build | tee dbt.log
./synq-dbt synq_upload_artifacts --dbt-log-file dbt.log
To successfully install and launch synq-dbt
you will need SYNQ_TOKEN
secret, that you generate in your SYNQ account when integrating with dbt Core. Reach out to the team if you have any questions. It should be treated as a secret as it allows SYNQ to identify you as the customer and associate uploaded data with your workspace.
We will cover two most common setups of dbt and Airflow:
In case none of these works for you, don't hesitate to get in touch with us.
-
In Airflow UI, go to Environment variables. Create a new ENV variable called
SYNQ_TOKEN
with SYNQ token as a value. -
Install
synq-dbt
into Docker container that is executed by your DockerOperator
Add the following lines to your runner's Dockerfile to install synq-dbt
:
ENV SYNQ_VERSION=v1.5.0
RUN wget -O /usr/bin/synq-dbt https://github.com/getsynq/synq-dbt/releases/download/${SYNQ_VERSION}/synq-dbt-amd64-linux
RUN chmod +x /usr/bin/synq-dbt
- Change Docker container entrypoint (usually last line in Dockerfile) from
dbt
tosynq-dbt
OR change the command in the DbtOperator itself in your Airflow's Dag fromdbt
tosynq-dbt
In the case of KubernetesPodOperator
change, the result should for example look as follows:
KubernetesPodOperator(
...
env_vars={
"SYNQ_TOKEN": Variable.get("SYNQ_TOKEN")
},
cmds=["synq-dbt"],
arguments=["test"],
)
You're all set! 🎉
-
In Airflow UI, go to Environment variables. Create a new ENV variable called
SYNQ_TOKEN
with SYNQ token as a value. -
Execute the following shell commands to download the latest version of
synq-dbt
export SYNQ_VERSION=v1.5.0
wget -O ./synq-dbt https://github.com/getsynq/synq-dbt/releases/download/${SYNQ_VERSION}/synq-dbt-amd64-linux
chmod +x ./synq-dbt
- Move the
synq-dbt
binary in your $PATH
mv synq-dbt /usr/local/bin/synq-dbt
- Unfortunatelly, Dbt*Operators haven't been released to pip for quite some time and a pull-request that added env argument was not released to pip yet
In case you build the airflow-dbt locally
Change your
Dbt*Operator
sdbt_bin
argument as follows:
dbt_run = DbtRunOperator(
env={
"SYNQ_TOKEN": Variable.get("SYNQ_TOKEN")
},
dbt_bin='synq-dbt',
...
)
Otherwise, you will need to set an env SYNQ_TOKEN
on the system running Airflow with:
export SYNQ_TOKEN=your-synq-token
You're all set! 🎉
Add the following lines to your Dockerfile:
ENV SYNQ_VERSION=v1.5.0
RUN wget -O /usr/bin/synq-dbt https://github.com/getsynq/synq-dbt/releases/download/${SYNQ_VERSION}/synq-dbt-amd64-linux
RUN chmod +x /usr/bin/synq-dbt
The synq-dbt
command is available for execution. 🎉
- Execute the following shell commands to download the latest version of
synq-dbt
export SYNQ_VERSION=v1.5.0
wget -O ./synq-dbt https://github.com/getsynq/synq-dbt/releases/download/${SYNQ_VERSION}/synq-dbt-amd64-linux
chmod +x ./synq-dbt
- Move the
synq-dbt
binary in your $PATH
mv synq-dbt /usr/local/bin/synq-dbt
The synq-dbt
command is available for execution. 🎉
OSX version is primarily used for testing, by manually triggering synq-ctl
.
- Execute the following shell commands to download the latest version of
synq-dbt
export SYNQ_VERSION=v1.5.0
wget -O ./synq-dbt https://github.com/getsynq/synq-dbt/releases/download/${SYNQ_VERSION}/synq-dbt-arm64-darwin
chmod +x ./synq-dbt
- Move the
synq-dbt
binary in your $PATH
mv synq-dbt /usr/local/bin/synq-dbt
- Export your
SYNQ_TOKEN
to the current shell
export SYNQ_TOKEN=<your-token>
- Check current version of
dbt
viasynq-dbt
as follows:
synq-dbt --version
will result in:
07:04:54 synq-dbt processing `dbt --version`
Core:
- installed: 1.2.0
- latest: 1.3.0 - Update available!
...
You're all set! 🎉
Note: Note when testing synq-dbt
locally on your mac, it is recommended you delete target/
folder before you execute synq-dbt
so it doesn't pickup old dbt artifacts.
- In the
.env
file in your root directory, create a variable calledSYNQ_TOKEN
with SYNQ token as a value (i.e.SYNQ_TOKEN=<TOKEN_VALUE>
). - In your
definitions.py
file, update your dbt resources definition to usesynq-dbt
resources={
"dbt": DbtCliResource(dbt_executable='synq-dbt', project_dir=os.fspath(dbt_project_dir)),
}
- By default, Dagster creates a dynamic path for the dbt artifacts but SYNQ always looks in the root target folder. In your
assets.py
file, update thetarget_path
so that artifacts are stored in the root target folder
@dbt_assets(manifest=dbt_manifest_path)
def jaffle_shop_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
dbt_target_path = Path('target')
yield from dbt.cli(["build"], target_path=dbt_target_path, context=context).stream()
A: synq-dbt
is dbt
version agnostic and works with the version of dbt you have installed on your system.
A: Every time it executes synq-dbt,
one gRPC request is made to SYNQ servers. The payload of the request contains dbt artifacts and an authentication token that the server uses to verify your data.
Note: Depending on your setup, you might have to allow egress traffic in your network firewall to dbtapi.synq.io:443
.
A: Since most of the data is text, the total size of the payload is roughly equivalent to the sum of the sizes of dbt artifacts. dbt_manifest.json
is usually the largest, and the final size of the request depends on the size of your project, ranging from a few MBs to higher tens of MBs typically.
Note: Depending on your setup, you might have to allow large payloads in your network firewall.
A: Unless our system experiences an unusual traffic spike, data should be available in UI within a few minutes.