Add support to DBT_RUNNER execution mode #717

tatiana · 2023-11-28T11:24:45Z

Context

There was a great recommendation from @sanromeo in the#airflow-dbt slack channel:
https://apache-airflow.slack.com/archives/C059CC42E9W/p1701098801633179

To use DbtRunner (https://docs.getdbt.com/reference/programmatic-invocation) instead of subprocess for running (executing) dbt commands.

Historically, we decided not to adopt dbt-core as a dependency of Cosmos to avoid the conflicts between Airflow and dbt-core:
https://astronomer.github.io/astronomer-cosmos/getting_started/execution-modes-local-conflicts.html#execution-modes-local-conflicts

However, as pointed out by @sanromeo, there are no more conflicts between dbt 1.7.0+ and Airflow 2.7.0+. So we could offer this as an alternative Cosmos ExecutionMode to users who are confident their dbt-core and Airflow versions do not conflict. If this approach is successful, we can also look into allowing users to use the same strategy in the LoadMode.DBT_LS.

Acceptance criteria

Allow users to use ExecutionConfig(execution_mode= ExecutionMode.DBT_RUNNER), which will not rely on Python subprocess, but call dbt DbtRunner

The text was updated successfully, but these errors were encountered:

dosubot · 2023-11-28T11:25:08Z

🤖

It seems like you've got this under control, if you want help or have specific questions, let me know what I can do for you!

…850) ## Description This PR adds `dbtRunner` programmatic invocation for `ExecutionMode.LOCAL`. I decided to not make a new execution mode for each (e.g. `ExecutionMode.LOCAL_DBT_RUNNER`) and all of the child operators but instead added an additional config `ExecutionConfig.invocation_mode` where `InvocationMode.DBT_RUNNER` could be specified. This is so that users who are already using local execution mode could use dbt runner and see performance improvements. With the `dbtRunnerResult` it makes it easy to know whether the dbt run was successful and logs do not need to be parsed but are still logged in the operator: ![image](https://github.com/astronomer/astronomer-cosmos/assets/79104794/76a4cf82-f0f2-4133-8d68-a0a6a145b1d8) ## Performance Testing After #827 was added, I modified it slightly to use postgres adapter instead of sqlite because the latest dbt-core support for sqlite is 1.4 when programmatic invocation requires >=1.5.0. I got the following results comparing subprocess to dbt runner for 10 models: 1. `InvocationMode.SUBPROCESS`: ```shell Ran 10 models in 23.77661895751953 seconds NUM_MODELS=10 TIME=23.77661895751953 ``` 2. `InvocationMode.DBT_RUNNER`: ```shell Ran 10 models in 8.390100002288818 seconds NUM_MODELS=10 TIME=8.390100002288818 ``` So using `InvocationMode.DBT_RUNNER` is almost 3x faster, and can speed up dag runs if there are a lot of models that execute relatively quickly since there seems to be a 1-2s speed up per task. One thing I found while working on this is that a [manifest](https://docs.getdbt.com/reference/programmatic-invocations#reusing-objects) is stored in the result if you parse a project with the runner, and can be reused in subsequent commands to avoid reparsing. This could be a useful way for caching the manifest if we use dbt runner for dbt ls parsing and could speed up the initial render as well. I thought at first it would be easy to have this also work for virtualenv execution, since I at first thought the entire `execute` method was run in the virtualenv, which is not the case since the virtualenv operator creates a virtualenv and then passes the executable path to a subprocess. It may be possible to have this work for virtualenv and would be better suited for a follow-up PR. ## Related Issue(s) closes #717 ## Breaking Change? None ## Checklist - [x] I have made corresponding changes to the documentation (if required) - [x] I have added tests that prove my fix is effective or that my feature works - added unit tests and integration tests.

…stronomer#850) ## Description This PR adds `dbtRunner` programmatic invocation for `ExecutionMode.LOCAL`. I decided to not make a new execution mode for each (e.g. `ExecutionMode.LOCAL_DBT_RUNNER`) and all of the child operators but instead added an additional config `ExecutionConfig.invocation_mode` where `InvocationMode.DBT_RUNNER` could be specified. This is so that users who are already using local execution mode could use dbt runner and see performance improvements. With the `dbtRunnerResult` it makes it easy to know whether the dbt run was successful and logs do not need to be parsed but are still logged in the operator: ![image](https://github.com/astronomer/astronomer-cosmos/assets/79104794/76a4cf82-f0f2-4133-8d68-a0a6a145b1d8) ## Performance Testing After astronomer#827 was added, I modified it slightly to use postgres adapter instead of sqlite because the latest dbt-core support for sqlite is 1.4 when programmatic invocation requires >=1.5.0. I got the following results comparing subprocess to dbt runner for 10 models: 1. `InvocationMode.SUBPROCESS`: ```shell Ran 10 models in 23.77661895751953 seconds NUM_MODELS=10 TIME=23.77661895751953 ``` 2. `InvocationMode.DBT_RUNNER`: ```shell Ran 10 models in 8.390100002288818 seconds NUM_MODELS=10 TIME=8.390100002288818 ``` So using `InvocationMode.DBT_RUNNER` is almost 3x faster, and can speed up dag runs if there are a lot of models that execute relatively quickly since there seems to be a 1-2s speed up per task. One thing I found while working on this is that a [manifest](https://docs.getdbt.com/reference/programmatic-invocations#reusing-objects) is stored in the result if you parse a project with the runner, and can be reused in subsequent commands to avoid reparsing. This could be a useful way for caching the manifest if we use dbt runner for dbt ls parsing and could speed up the initial render as well. I thought at first it would be easy to have this also work for virtualenv execution, since I at first thought the entire `execute` method was run in the virtualenv, which is not the case since the virtualenv operator creates a virtualenv and then passes the executable path to a subprocess. It may be possible to have this work for virtualenv and would be better suited for a follow-up PR. ## Related Issue(s) closes astronomer#717 ## Breaking Change? None ## Checklist - [x] I have made corresponding changes to the documentation (if required) - [x] I have added tests that prove my fix is effective or that my feature works - added unit tests and integration tests.

jbandoro self-assigned this Feb 3, 2024

jbandoro mentioned this issue Feb 6, 2024

Add support for InvocationMode.DBT_RUNNER for local execution mode #836

Closed

2 tasks

jbandoro mentioned this issue Feb 17, 2024

Add support for InvocationMode.DBT_RUNNER for local execution mode #850

Merged

2 tasks

jbandoro closed this as completed in #850 Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to DBT_RUNNER execution mode #717

Add support to DBT_RUNNER execution mode #717

tatiana commented Nov 28, 2023

dosubot bot commented Nov 28, 2023

Add support to DBT_RUNNER execution mode #717

Add support to DBT_RUNNER execution mode #717

Comments

tatiana commented Nov 28, 2023

dosubot bot commented Nov 28, 2023