-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support dag_ids arguments in db cleanup utility #24987
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add typing for airflow/configuration.py The configuraiton.py did not have typing information and it made it rather difficult to reason about it-especially that it went a few changes in the past that made it rather complex to understand. This PR adds typing information all over the configuration file (cherry picked from commit 71e4deb)
(cherry picked from commit 028087b)
* Handle invalid date from query parameters in views. * Add tests. * Use common parsing helper. * Add type hint. * Remove unwanted error check. * Fix extra_links endpoint. (cherry picked from commit 9e25bc2)
* UnicodeDecodeError: 'utf-8' codec can't decode byte 0xXX in position X: invalid start byte File "/opt/work/python395/lib/python3.9/site-packages/airflow/hooks/subprocess.py", line 89, in run_command line = raw_line.decode(output_encoding).rstrip() # raw_line == b'\x00\x00\x00\x11\xa9\x01\n' UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 4: invalid start byte * Update subprocess.py * Update subprocess.py * Fix: Exception when parsing log apache#20966 * Fix: Exception when parsing log apache#20966 Another alternative is: try-catch it. e.g. ``` line = '' for raw_line in iter(self.sub_process.stdout.readline, b''): try: line = raw_line.decode(output_encoding).rstrip() except UnicodeDecodeError as err: print(err, output_encoding, raw_line) self.log.info("%s", line) ``` * Create test_subprocess.sh * Update test_subprocess.py * Added shell directive and license to test_subprocess.sh * Distinguish between raw and decoded lines as suggested by @uranusjr * simplify test Co-authored-by: muhua <microhuang@live.com> (cherry picked from commit 863b257)
) (cherry picked from commit fcfaa83)
apache#23486) When task is expanded from a mapped task that returned no value, it crashes the scheduler. This PR fixes it by first checking if there's a return value from the mapped task, if no returned value, then error in the task itself instead of crashing the scheduler (cherry picked from commit 7813f99)
…e#23521) * Prevent KubernetesJobWatcher getting stuck on resource too old If the watch fails because "resource too old" the KubernetesJobWatcher should not retry with the same resource version as that will end up in loop where there is no progress. * Reset ResourceVersion().resource_version to 0 (cherry picked from commit dee05b2)
In certain databases there is a need to set the collation for ID fields like dag_id or task_id to something different than the database default. This is because in MySQL with utf8mb4 the index size becomes too big for the MySQL limits. In past pull requests this was handled [apache#7570](apache#7570), [apache#17729](apache#17729), but the root_dag_id field on the dag model was missed. Since this field is used to join with the dag_id in various other models ([and self-referentially](https://github.com/apache/airflow/blob/451c7cbc42a83a180c4362693508ed33dd1d1dab/airflow/models/dag.py#L2766)), it also needs to have the same collation as other ID fields. This can be seen by running `airflow db reset` before and after applying this change while also specifying `sql_engine_collation_for_ids` in the configuration. Other related PRs [apache#19408](apache#19408) (cherry picked from commit b7f8627)
* Fix `PythonVirtualenvOperator` templated_fields The `PythonVirtualenvOperator` templated_fields override `PythonOperator` templated_fields which caused functionality not to work as expected. fixes: apache#23557 (cherry picked from commit 1657bd2)
Update missing bracket (cherry picked from commit 827bfda)
(cherry picked from commit b0406f5)
…KubernetesExecutor (apache#23617) (cherry picked from commit c5b72bf)
(cherry picked from commit 5d8cda8)
(cherry picked from commit f313e14)
These checks are only make sense for upgrades. Generally they exist to resolve referential integrity issues etc before adding constraints. In the downgrade context, we generally only remove constraints, so it's a non-issue. (cherry picked from commit 9ab9cd4)
(cherry picked from commit 6f82fc7)
(cherry picked from commit 3fa5716)
Move top margin to each breadcrumb component to make sure that there is no overlap when the header wraps with long names. (cherry picked from commit f77a691)
(cherry picked from commit 239a9dc)
…pache#23674) fix apache#23411 (cherry picked from commit f9e2a30)
when StandardTaskRunner runs tasks with exec Issue: apache#23540 (cherry picked from commit e453e68)
If you tried to expand via xcom into a non-templated field without explicitly setting the upstream task dependency, the scheduler would crash because the upstream task dependency wasn't being set automatically. It was being set only for templated fields, but now we do it for both. (cherry picked from commit 3849ebb)
(cherry picked from commit 9837e6d)
The rename from apache#23562 missed few shell_parms usage where it also should be replaced. (cherry picked from commit 4afa8e3)
…pache#23687) Several commands of Breeze depends on docker, docker compose being available as well as breeze image. They will work fine if you "just" built the image but they might benefit from the image being rebuilt (to make sure all latest dependencies are installed in the image). The common checks done in "shell" command for that are now extracted to common utils and run as first thing in those commands that need it. (cherry picked from commit 3f4ab6c)
The "wait for image" step lacked --tag-as-latest which made the subsequent "fix-ownership" step run sometimes far longer than needed - because it rebuilt the image for fix-ownership case. Also the "fix-ownership" command has been changed to just pull the image if one is missing locally rather than build. This command might be run in an environment where the image is missing or any other image was build (for example in jobs where an image was build for different Python version) in this case the command will simply use whatever Python version is available (it does not matter), or in case no image is available, it will pull the image as the last resort. (cherry picked from commit 5e3f652)
After apache#23775 I noticed that there is yet another small improvement area in the CI buld speed. Currently build-ci-image builds and push only "commit-tagged" images, but "fix-ownership" requires the "latest" image to run. This PR adds --tag-as-latest option also to build-image and build-prod-image commands - similarly as for the pull-image and pull-prod-image. This will retag the "commit" images as latest in the build-ci-images step and allow to save 1m on pulling the latest image before fix-ownership (bringing it back to 1s overhead) (cherry picked from commit 252ef66)
(cherry picked from commit 056b8cf)
…nns (apache#24735) (cherry picked from commit 54d5972)
(cherry picked from commit ca361ee)
(cherry picked from commit 46ac083)
There were errors with retieving constraints branch caused by using different convention for output names (sometimes dash, sometimes camelCase as suggested by most GitHub documents). The "dash-name" looks much better and is far more readable so we shoud unify all internal outputs to follow it. During that rename some old, unused outputs were removed, also it turned out that the new selective-check can replace previous "dynamic outputs" written in Bash as well. Additionally, the "defaults" are now retrieved via Python script, not bash script which will make it much more readable - both build_images and ci.yaml use it in the right place - before replacing the scripts and dev with the version coming in from PR in case of build_images.yaml. (cherry picked from commit 017507be1e1dbf39abcc94a44fab8869037893ea)
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes: #24828
this PR adds optional
dag_id_column
to each table configuration and optional listdag_ids
to the runtime logic to append in condition at_build_query
func to tables whom support this column.added test regarding the dag_ids arguments and updated cli parser and commands to support the new argument.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.