Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG params ordering goes by key string length for some backends #40154

Closed
1 of 2 tasks
Usiel opened this issue Jun 10, 2024 · 3 comments · Fixed by #40156
Closed
1 of 2 tasks

DAG params ordering goes by key string length for some backends #40154

Usiel opened this issue Jun 10, 2024 · 3 comments · Fixed by #40156
Assignees
Labels
area:core area:UI Related to UI/UX. For Frontend Developers. good first issue kind:bug This is a clearly a bug

Comments

@Usiel
Copy link
Contributor

Usiel commented Jun 10, 2024

Apache Airflow version

2.9.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Took me a while to find the underlying issue for this one 😓

When creating a DAG with params the ordering does not follow the insertion order (as promised by https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html), but instead params are ordered by the length of their key when MySQL is used:

image

This seems to be an issue with MySQL's JSON column, which orders the keys on insertion. The Airflow side of things (and SQLAlchemy) does not cause any issues with the ordering.

What you think should happen instead?

Instead params should be shown in insertion order.

How to reproduce

  1. Run Airflow with MySQL (I tested 5.7 and 8.0) or Postgres and the following DAG.
import datetime
import json
from pathlib import Path

from airflow.decorators import dag, task
from airflow.models import Param


@dag(
    dag_id=Path(__file__).stem,
    dag_display_name="Params UI MySQL Bug",
    schedule=None,
    start_date=datetime.datetime(2022, 3, 4),
    default_args={},
    params={
        "____________first": Param(default="3."),
        "_______second": Param(default="2."),
        "________________third": Param(default="4."),
        "_______fourth": Param(default="1."),
    },
)
def dag():
    @task(task_display_name="Show used parameters")
    def show_params(**kwargs) -> None:
        params = kwargs["params"]
        print(f"This DAG was triggered with the following parameters:\n\n{json.dumps(params, indent=4)}\n")

    show_params()

dag()
  1. Trigger the DAG from the UI.

Deployment details

MySQL 5.7 or 8.0 / Postgres

Anything else?

Workaround: Set compress_serialized_dags = True to avoid usage of the MySQL JSON column (comes with the drawback of disabled DAG dependencies view).

Maybe related to #35944 (assuming some of the people who complained used MySQL).

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Usiel Usiel added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jun 10, 2024
@vatsrahul1001 vatsrahul1001 added area:UI Related to UI/UX. For Frontend Developers. good first issue and removed needs-triage label for new issues that we didn't triage yet labels Jun 10, 2024
@Usiel
Copy link
Contributor Author

Usiel commented Jun 10, 2024

I do have a working code fix (no tests yet), but I also don't mind leaving this for someone else if anyone is keen.

@vatsrahul1001
Copy link
Collaborator

@Usiel feel free to take this up if you already have solution for this

@Usiel Usiel changed the title DAG params ordering with MySQL goes by key string length DAG params ordering goes by key string length for some backends Jun 10, 2024
@Usiel
Copy link
Contributor Author

Usiel commented Jun 10, 2024

I checked the MySQL docs and found

To make lookups more efficient, MySQL also sorts the keys of a JSON object. You should be aware that the result of this ordering is subject to change and not guaranteed to be consistent across releases.
https://dev.mysql.com/doc/refman/8.4/en/json.html

Which hints to me that Postgres probably does the same or something similar, and I can, in fact, reproduce the same issue. I changed the issue title accordingly.

Usiel added a commit to Usiel/airflow that referenced this issue Jun 11, 2024
Fixes apache#40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?
eladkal pushed a commit that referenced this issue Jun 13, 2024
* Ensures DAG params order regardless of backend

Fixes #40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
utkarsharma2 pushed a commit that referenced this issue Jul 2, 2024
* Ensures DAG params order regardless of backend

Fixes #40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
(cherry picked from commit 2149b4d)
romsharon98 pushed a commit to romsharon98/airflow that referenced this issue Jul 26, 2024
* Ensures DAG params order regardless of backend

Fixes apache#40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Sep 20, 2024
* Ensures DAG params order regardless of backend

Fixes apache/airflow#40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
(cherry picked from commit 2149b4dbee8fb524bb235280aaef158afaec8d4a)
GitOrigin-RevId: 92777adbb367d549c654d6ae9856d0f19d671a81
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Nov 9, 2024
* Ensures DAG params order regardless of backend

Fixes apache/airflow#40154

This change adds an extra attribute to the serialized DAG param objects which helps us decide
the order of the deserialized params dictionary later even if the backend messes with us.

I decided not to limit this just to MySQL since the operation is inexpensive and may turn
out to be helpful.

I made sure the new test fails with the old implementation + MySQL. I assume this test will be
executed with MySQL somewhere in the build actions?

* Removes GitHub reference

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Serialize DAG params as array of tuples to ensure ordering

Alternative to previous approach: We serialize the DAG params dict as a list of tuples which _should_ keep their ordering regardless of backend.

Backwards compatibility is ensured because if `encoded_params` is a `dict` (not the expected `list`) then `dict(encoded_params)` still works.

* Make backwards compatibility more explicit

Based on suggestions by @uranusjr with an additional fix to make mypy happy.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
GitOrigin-RevId: 2149b4dbee8fb524bb235280aaef158afaec8d4a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:UI Related to UI/UX. For Frontend Developers. good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants