Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of tests by not creating connections at parse time #45690

Merged
merged 1 commit into from
Jan 15, 2025

Conversation

ashb
Copy link
Member

@ashb ashb commented Jan 15, 2025

The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.

  • The SystemTestContextBuilder was trying to fetch things from SSM. This was
    addressed by adding a functools.cache on the function
  • The Bedrock example dag was setting/caching the underlying conn object
    globally. This was addressed by making the Airflow connection a global,
    rather than the Bedrock conn. This fix is not great, but it does massively
    help

Before:

111 passed, 1 warning in 439.37s (0:07:19)

After:

111 passed, 1 warning in 71.76s (0:01:11)

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jan 15, 2025
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@ashb
Copy link
Member Author

ashb commented Jan 15, 2025

Prepare package failed due to a node tls timeout

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That Bedrock one was on me. I thought I was just saving a bunch of repetition and didn't realize the impact of that small change. Well spotted, and thanks for fixing it.

@ashb ashb force-pushed the speed-up-serdag-tests branch from dc3426e to 4dff904 Compare January 15, 2025 21:02
The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.

- The SystemTestContextBuilder was trying to fetch things from SSM. This was
  addressed by adding a functools.cache on the function
- The Bedrock example dag was setting/caching the underlying conn object
  globally. This was addressed by making the Airflow connection a global,
  rather than the Bedrock conn. This fix is not _great_, but it does massively
  help

Before:

> 111 passed, 1 warning in 439.37s (0:07:19)

After:

> 111 passed, 1 warning in 71.76s (0:01:11)
@ashb ashb force-pushed the speed-up-serdag-tests branch from 4dff904 to 08748ba Compare January 15, 2025 21:24
@ashb ashb merged commit 102e853 into main Jan 15, 2025
60 checks passed
@ashb ashb deleted the speed-up-serdag-tests branch January 15, 2025 21:56
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 16, 2025
Follow up after apache#45690 and apache#45682

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

While testing it, I also discovered that after apache#45682 all kinds of
exceptions when DAGBag parsed the example dags were silently ignored -
they were just logged to the output and swallowed. This means that one
of the purpose of example_dags - to catch accidental import errors
and typos were not really fulfilled, because any exceptions during
parsing would not be surfaced.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing"..
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 16, 2025
Follow up after apache#45690 and apache#45682

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 16, 2025
Follow up after apache#45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 16, 2025
Follow up after apache#45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
@potiuk
Copy link
Member

potiuk commented Jan 16, 2025

That Bedrock one was on me. I thought I was just saving a bunch of repetition and didn't realize the impact of that small change. Well spotted, and thanks for fixing it.

As part of "clearing the path" - here is #45704 as a follow up to prevent it in the future.

With one caveat .... the test would not have caught the error in this case because aws_conn_id was set to None, so actually ... get_connection() was not called. But I think that's something we cannot actually easily prevent, but it's a bit of bad practice to set aws_conn_id for "example_dags" - because it makes it ... bad example , and not good candidates to run as system tests.

The added test will make it quite obvious though: when someone adds example_dag with get_connection retrieval, the error will be:

  f"BaseHook.get_connection() should not be called during DAG parsing. "
  f"It was called {mock_get_connection.call_count} times. Please make sure that no "
  "connections are created during DAG parsing. NOTE! Do not set conn_id to None to avoid it, just make "
  "sure that you do not create connection object in the `__init__` method of your operator."

I hope this will be helpful to avoid such mistake in the future :)

HariGS-DB pushed a commit to HariGS-DB/airflow that referenced this pull request Jan 16, 2025
…che#45690)

The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.

- The SystemTestContextBuilder was trying to fetch things from SSM. This was
  addressed by adding a functools.cache on the function
- The Bedrock example dag was setting/caching the underlying conn object
  globally. This was addressed by making the Airflow connection a global,
  rather than the Bedrock conn. This fix is not _great_, but it does massively
  help

Before:

> 111 passed, 1 warning in 439.37s (0:07:19)

After:

> 111 passed, 1 warning in 71.76s (0:01:11)
potiuk added a commit that referenced this pull request Jan 16, 2025
Follow up after #45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
@ashb ashb added the backport-to-v2-10-test Mark PR with this label to backport to v2-10-test branch label Jan 21, 2025
github-actions bot pushed a commit that referenced this pull request Jan 21, 2025
…rse time (#45690)

The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.

- The SystemTestContextBuilder was trying to fetch things from SSM. This was
  addressed by adding a functools.cache on the function
- The Bedrock example dag was setting/caching the underlying conn object
  globally. This was addressed by making the Airflow connection a global,
  rather than the Bedrock conn. This fix is not _great_, but it does massively
  help

Before:

> 111 passed, 1 warning in 439.37s (0:07:19)

After:

> 111 passed, 1 warning in 71.76s (0:01:11)
(cherry picked from commit 102e853)

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
Copy link

Backport successfully created: v2-10-test

Status Branch Result
v2-10-test PR Link

kaxil pushed a commit that referenced this pull request Jan 21, 2025
…rse time (#45690) (#45826)

The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.

- The SystemTestContextBuilder was trying to fetch things from SSM. This was
  addressed by adding a functools.cache on the function
- The Bedrock example dag was setting/caching the underlying conn object
  globally. This was addressed by making the Airflow connection a global,
  rather than the Bedrock conn. This fix is not _great_, but it does massively
  help

Before:

> 111 passed, 1 warning in 439.37s (0:07:19)

After:

> 111 passed, 1 warning in 71.76s (0:01:11)
(cherry picked from commit 102e853)

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
@potiuk
Copy link
Member

potiuk commented Jan 21, 2025

BTW. There was no need to backport this one - provider tests are skipped on v2-10-test.

dauinh pushed a commit to dauinh/airflow that referenced this pull request Jan 24, 2025
…che#45690)

The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.

- The SystemTestContextBuilder was trying to fetch things from SSM. This was
  addressed by adding a functools.cache on the function
- The Bedrock example dag was setting/caching the underlying conn object
  globally. This was addressed by making the Airflow connection a global,
  rather than the Bedrock conn. This fix is not _great_, but it does massively
  help

Before:

> 111 passed, 1 warning in 439.37s (0:07:19)

After:

> 111 passed, 1 warning in 71.76s (0:01:11)
dauinh pushed a commit to dauinh/airflow that referenced this pull request Jan 24, 2025
Follow up after apache#45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
…che#45690)

The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.

- The SystemTestContextBuilder was trying to fetch things from SSM. This was
  addressed by adding a functools.cache on the function
- The Bedrock example dag was setting/caching the underlying conn object
  globally. This was addressed by making the Airflow connection a global,
  rather than the Bedrock conn. This fix is not _great_, but it does massively
  help

Before:

> 111 passed, 1 warning in 439.37s (0:07:19)

After:

> 111 passed, 1 warning in 71.76s (0:01:11)
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
Follow up after apache#45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers backport-to-v2-10-test Mark PR with this label to backport to v2-10-test branch provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants