-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to TestBehavior.BUILD
#1377
Conversation
Deploying astronomer-cosmos with Cloudflare Pages
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code makes sense to me, so giving it a LGTM. Thanks for taking this on @tatiana!
A thought as I read this, for another day though:
dbt 1.8+ has a new concept called unit tests, which it differentiates from data tests (formerly just "tests"). dbt's preferred way of running things is: unit tests -> build model -> data tests
.
For the TestBehavior.BUILD
, we comport with how dbt wants to run things 👍 because the order of operations is resolved automatically by dbt build
.
We don't currently handle dbt unit tests for TestBehavior.AFTER_ALL
and TestBehavior.AFTER_EACH
though. 🤔 A natural way to do this would be to have AFTER_ALL
also mean "before all" when it comes to unit tests, and similarly AFTER_EACH
means "before each" for unit tests.
Alternatively, these could be decoupled; i.e. "before each, unit test" and "after all, data test," similarly "before all, unit test" and "after each, data test." But for TestBehavior.BUILD
, decoupling gets weird. 🤷
No actionable item here other than, potentially, to mention in the docs that TestBehavior.BUILD
is currently the only way to run dbt unit tests using Cosmos's automatic graph parsing. Just thinking out loud about the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Amazing support added and nice optimization. Happy to merge once #1374 is merged & this branch is rebased, and the CI passes after.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1377 +/- ##
=======================================
Coverage 96.23% 96.24%
=======================================
Files 67 67
Lines 4042 4051 +9
=======================================
+ Hits 3890 3899 +9
Misses 152 152 ☔ View full report in Codecov by Sentry. |
Thanks a lot for the review and feedback, @dwreeves @pankajastro @pankajkoti! I added a follow-up action to review what @dwreeves mentioned (unit tests x data tests split): #1386 I temptively added this to Cosmos 2.x, but we may want to handle this before. |
**New Features** * Support customizing Airflow operator arguments per dbt node by @wornjs in #1339. [More information](https://astronomer.github.io/astronomer-cosmos/getting_started/custom-airflow-properties.html). * Support uploading dbt artifacts to remote cloud storages via callback by @pankajkoti in #1389. [Read more](https://astronomer.github.io/astronomer-cosmos/configuration/callbacks.html). * Add support to ``TestBehavior.BUILD`` by @tatiana in #1377. [Documentation](https://astronomer.github.io/astronomer-cosmos/configuration/testing-behavior.html). * Add support for the "at" operator when using ``LoadMode.DBT_MANIFEST`` or ``CUSTOM`` by @benjy44 in #1372 * Add dbt clone operator by @pankajastro in #1326, as documented in [here](https://astronomer.github.io/astronomer-cosmos/getting_started/operators.html). * Support rendering tasks with non-ASCII characters by @t0momi219 in #1278 [Read more](https://astronomer.github.io/astronomer-cosmos/configuration/task-display-name.html) * Add warning callback on source freshness by @pankajastro in #1400 [Read more](https://astronomer.github.io/astronomer-cosmos/configuration/source-nodes-rendering.html#on-warning-callback-callback) * Add Oracle Profile mapping by @slords and @pankajkoti in #1190 and #1404 * Emit telemetry to Scarf during DAG run by @tatiana in #1397 * Save tasks map as ``DbtToAirflowConverter`` property by @internetcoffeephone and @hheemskerk in #1362 **Bug Fixes** * Fix the mock value of port in ``TrinoBaseProfileMapping`` to be an integer by @dwolfeu #1322 * Fix access to the ``dbt docs`` menu item outside of Astro cloud by @tatiana in #1312 * Add missing ``DbtSourceGcpCloudRunJobOperator`` in module ``cosmos.operators.gcp_cloud_run_job`` by @anai-s in #1290 * Support building ``DbtDag`` without setting paths in ``ProjectConfig`` by @tatiana in #1307 * Fix parsing dbt ls outputs that contain JSONs that are not dbt nodes by @tatiana in #1296 * Fix Snowflake Profile mapping when using AWS default region by @tatiana in #1406 * Fix dag rendering for taskflow + DbtTaskGroup combo by @pankajastro in #1360 **Enhancements** * Improve dbt command execution logs to troubleshoot ``None`` values by @tatiana in #1392 * Add logging of stdout to dbt graph run_command by @KarolGongola in #1390 * Save tasks map as DbtToAirflowConverter property by @internetcoffeephone and @hheemskerk in #1362 * Support rendering build operator task-id with non-ASCII characters by @pankajastro in #1415 **Docs** * Remove extra ` char from docs by @pankajastro in #1345 * Add limitation about copying target dir files to remote by @pankajkoti in #1305 * Generalise example from README by @ReadytoRocc in #1311 * Add security policy by @tatiana, @chaosmaw and @lzdanski in # 1385 * Mention in documentation that the callback functionality is supported in ``ExecutionMode.VIRTUALENV`` by @pankajkoti in #1401 **Others** * Restore Jaffle Shop so that ``basic_cosmos_dag`` works as documented by @tatiana in #1374 * Remove Pytest durations from tests scripts by @tatiana in #1383 * Remove typing-extensions as dependency by @pankajastro in #1381 * Pin dbt-databricks version to < 1.9 by @pankajastro in #1376 * Refactor ``dbt-sqlite`` tests to use ``dbt-postgres`` by @pankajastro in #1366 * Remove 'dbt-core<1.8.9' pin by @tatiana in #1371 * Remove dependency ``eval_type_backport`` by @tatiana in #1370 * Enable kubernetes tests for dbt>=1.8 by @pankajastro #1364 * CI Workaround: Pin dbt-core, Disable SQLite Tests, and Correctly Ignore Clone Test to Pass CI by @pankajastro in #1337 * Enable Azure task in the remote store manifest example DAG by @pankajkoti in #1333 * Enable GCP remote manifest task by @pankajastro in #1332 * Add exempt label option in GH action stale job by @pankajastro in #1328 * Add integration test for source node rendering by @pankajastro in #1327 * Fix vulnerability issue on docs dependency by @tatiana in #1313 * Add postgres pod status check for k8s tests in CI by @pankajkoti in #1320 * [CI] Reduce the amount taking to run tests in the CI from 5h to 11min by @tatiana in #1297 * Enable secret detection precommit check by @pankajastro in #1302 * Fix security vulnerability, by not pinning Airflow 2.10.0 by @tatiana in #1298 * Fix Netlify build timeouts by @tatiana in #1294 * Add stalebot to label/close stale PRs and issues by @tatiana in #1288 * Unpin dbt-databricks version by @pankajastro in #1409 * Fix source resource type tests by @pankajastro in #1405 * Increase performance tests models by @tatiana in #1403 * Drop running 1000 models in the CI by @pankajkoti in #1411 * Fix releasing package to PyPI by @tatiana in #1396 * Pre-commit hook updates in #1394, #1373, #1358, #1340, #1331, #1314, #1301 Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Pankaj Singh <pankaj.singh@astronomer.io> Closes: #1193 --------- Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
By default, Cosmos uses
TestBehavior.AFTER_EACH
, creating an Airflow TaskGroup that contains two tasks:While many users desire and expect this behaviour, it can also mean additional overhead, especially in dbt projects with more than 500 models. Each time the
dbt
command is executed, there is an overhead, even when using optimisations such as partial parsing anddbtRunner
. There is also an overhead on splitting a task into multiple Airflow workers.Illustrating some numbers with data shared by an Astronomer customer regarding the dbt command execution (between the logs "running dbt with arguments" and "Done."):
dbt build
for a particular model + its tests: 46sdbt run
+dbt test
individually: 2min15sThis PR introduces a new behaviour,
TestBehavior.BUILD
, where Cosmos can run both the model/seed/snapshot and the associated tests using a single command (dbt build
). For documentation on the dbt build, check https://docs.getdbt.com/reference/commands/build.This is an example of how the DAG will render when using this test behaviour when running:
And this is an example of the output, showing both the model is being run and also the tests, using the build command:
Closes: #892