-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAG params ordering goes by key string length for some backends #40154
Comments
I do have a working code fix (no tests yet), but I also don't mind leaving this for someone else if anyone is keen. |
@Usiel feel free to take this up if you already have solution for this |
I checked the MySQL docs and found
Which hints to me that Postgres probably does the same or something similar, and I can, in fact, reproduce the same issue. I changed the issue title accordingly. |
Fixes apache#40154 This change adds an extra attribute to the serialized DAG param objects which helps us decide the order of the deserialized params dictionary later even if the backend messes with us. I decided not to limit this just to MySQL since the operation is inexpensive and may turn out to be helpful. I made sure the new test fails with the old implementation + MySQL. I assume this test will be executed with MySQL somewhere in the build actions?
* Ensures DAG params order regardless of backend Fixes #40154 This change adds an extra attribute to the serialized DAG param objects which helps us decide the order of the deserialized params dictionary later even if the backend messes with us. I decided not to limit this just to MySQL since the operation is inexpensive and may turn out to be helpful. I made sure the new test fails with the old implementation + MySQL. I assume this test will be executed with MySQL somewhere in the build actions? * Removes GitHub reference Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Serialize DAG params as array of tuples to ensure ordering Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend. Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works. * Make backwards compatibility more explicit Based on suggestions by @uranusjr with an additional fix to make mypy happy. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
* Ensures DAG params order regardless of backend Fixes #40154 This change adds an extra attribute to the serialized DAG param objects which helps us decide the order of the deserialized params dictionary later even if the backend messes with us. I decided not to limit this just to MySQL since the operation is inexpensive and may turn out to be helpful. I made sure the new test fails with the old implementation + MySQL. I assume this test will be executed with MySQL somewhere in the build actions? * Removes GitHub reference Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Serialize DAG params as array of tuples to ensure ordering Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend. Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works. * Make backwards compatibility more explicit Based on suggestions by @uranusjr with an additional fix to make mypy happy. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> (cherry picked from commit 2149b4d)
* Ensures DAG params order regardless of backend Fixes apache#40154 This change adds an extra attribute to the serialized DAG param objects which helps us decide the order of the deserialized params dictionary later even if the backend messes with us. I decided not to limit this just to MySQL since the operation is inexpensive and may turn out to be helpful. I made sure the new test fails with the old implementation + MySQL. I assume this test will be executed with MySQL somewhere in the build actions? * Removes GitHub reference Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Serialize DAG params as array of tuples to ensure ordering Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend. Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works. * Make backwards compatibility more explicit Based on suggestions by @uranusjr with an additional fix to make mypy happy. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
* Ensures DAG params order regardless of backend Fixes apache/airflow#40154 This change adds an extra attribute to the serialized DAG param objects which helps us decide the order of the deserialized params dictionary later even if the backend messes with us. I decided not to limit this just to MySQL since the operation is inexpensive and may turn out to be helpful. I made sure the new test fails with the old implementation + MySQL. I assume this test will be executed with MySQL somewhere in the build actions? * Removes GitHub reference Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Serialize DAG params as array of tuples to ensure ordering Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend. Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works. * Make backwards compatibility more explicit Based on suggestions by @uranusjr with an additional fix to make mypy happy. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> (cherry picked from commit 2149b4dbee8fb524bb235280aaef158afaec8d4a) GitOrigin-RevId: 92777adbb367d549c654d6ae9856d0f19d671a81
* Ensures DAG params order regardless of backend Fixes apache/airflow#40154 This change adds an extra attribute to the serialized DAG param objects which helps us decide the order of the deserialized params dictionary later even if the backend messes with us. I decided not to limit this just to MySQL since the operation is inexpensive and may turn out to be helpful. I made sure the new test fails with the old implementation + MySQL. I assume this test will be executed with MySQL somewhere in the build actions? * Removes GitHub reference Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Serialize DAG params as array of tuples to ensure ordering Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend. Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works. * Make backwards compatibility more explicit Based on suggestions by @uranusjr with an additional fix to make mypy happy. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> GitOrigin-RevId: 2149b4dbee8fb524bb235280aaef158afaec8d4a
Apache Airflow version
2.9.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
Took me a while to find the underlying issue for this one 😓
When creating a DAG with params the ordering does not follow the insertion order (as promised by https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html), but instead params are ordered by the length of their key when MySQL is used:
This seems to be an issue with MySQL's JSON column, which orders the keys on insertion. The Airflow side of things (and SQLAlchemy) does not cause any issues with the ordering.
What you think should happen instead?
Instead params should be shown in insertion order.
How to reproduce
Deployment details
MySQL 5.7 or 8.0 / Postgres
Anything else?
Workaround: Set
compress_serialized_dags = True
to avoid usage of the MySQL JSON column (comes with the drawback of disabled DAG dependencies view).Maybe related to #35944 (assuming some of the people who complained used MySQL).
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: