-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support uploading more files from the target directory to remote_target_path #1293
Comments
@pankajkoti partial_parse would speed-up the runtime. Now on each task run it still needs to parse, even when providing a manifest.json
|
@joppedh, on the partial parsing side, have you been able to leverage https://astronomer.github.io/astronomer-cosmos/configuration/partial-parsing.html#partial-parsing? Cosmos should be caching it. But you'd need to leverage |
This issue is stale because it has been open for 30 days with no activity. |
…te cloud storages (#1389) This PR introduces helper functions that can be passed as callable callbacks for Cosmos tasks to execute post-task execution. These helper functions enable the uploading of artifacts (from the project's target directory) to various cloud storage providers, including AWS S3, Google Cloud Storage (GCS), Azure WASB, and general remote object stores using Airflow’s ObjectStoragePath. ## Key Changes Adds a `cosmos/io.py` module that includes the following helper functions 1. `upload_artifacts_to_aws_s3` - Uploads artifact files from a task’s local target directory to an AWS S3 bucket. - Supports dynamically appending DAG metadata (e.g., dag_id, task_id, run_id, and try number) to the uploaded file paths. - Utilizes S3Hook from the airflow.providers.amazon.aws module. 2. `upload_artifacts_to_gcp_gs` - Uploads artifact files from a task’s local target directory to a Google Cloud Storage (GCS) bucket. - Appends DAG-related context to the GCS object names for better traceability. - Leverages GCSHook from airflow.providers.google.cloud. 3. `upload_artifacts_to_azure_wasb` - Uploads artifact files from a task’s local target directory to an Azure Blob Storage container. - Automatically structures blob names with metadata, including dag_id, task_id, and execution details. - Utilizes WasbHook from the airflow.providers.microsoft.azure module. 4. `upload_artifacts_to_cloud_storage` - A generic helper function that uploads artifacts from a task’s local target directory to remote object stores configured via Airflow’s ObjectStoragePath (Airflow 2.8+ feature). - Supports custom remote storage configurations such as `remote_target_path` and `remote_target_path_conn_id`. - Dynamically constructs file paths that include DAG metadata for clear organization. These helpers functions can be passed as the `callback` argument to `DbtDAG` or to your `Dag` instance as demonstrated in the example DAGs `dev/dags/cosmos_callback_dag.py` and `dev/dags/example_operators.py` correspondingly. You can also pass `callback_args` as shown in the example DAGs. These helper functions are mere examples of how callback functions can be written and passed to your operators/DAGs to be executed after task completions. Taking reference of these helper functions, you can write your own callback function and pass those. ## Limitations 1. This PR has been tested and is currently supported only in `ExecutionMode.LOCAL`. We encourage the community to contribute by adding callback support for other execution modes as needed, using the implementation for `ExecutionMode.LOCAL` as a reference. closes: #1350 closes: #976 closes: #867 closes: #801 closes: #1292 closes: #851 closes: #1351 related: #1293 related: #1349
Currently, the
remote_target_path
configuration, added in PR #1224, only uploads files from thecompiled
directory within thetarget
directory of the dbt project—and solely whenExecutionMode.AIRFLOW_ASYNC
is enabled. However, thetarget
directory contains several other files and folders that could benefit users if they were also uploaded toremote_target_path
.Beyond the
compiled
directory, the target directory typically includes:A specific request was made in a Slack conversation to have
run_results.json
uploaded and accessible inremote_target_path
, highlighting its value to users.We should evaluate the potential benefits of supporting uploads for these additional files and folders and explore enabling this feature across all execution modes, not just
ExecutionMode.AIRFLOW_ASYNC
. Additionally, it may be worthwhile to consider uploading files from thecompiled
directory in other execution modes if it proves beneficial.We could potentially create sub tasks for each of these files & folders for evaluation of the benefits & supporting to upload those to the
remote_target_path
The text was updated successfully, but these errors were encountered: