-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speed of tests by not creating connections at parse time #45690
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Prepare package failed due to a node tls timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That Bedrock one was on me. I thought I was just saving a bunch of repetition and didn't realize the impact of that small change. Well spotted, and thanks for fixing it.
dc3426e
to
4dff904
Compare
The DAG serialization tests load all of the example and system test DAGs, and there were two places that these tests opened connections at parse time resulting in loads of extra of test time. - The SystemTestContextBuilder was trying to fetch things from SSM. This was addressed by adding a functools.cache on the function - The Bedrock example dag was setting/caching the underlying conn object globally. This was addressed by making the Airflow connection a global, rather than the Bedrock conn. This fix is not _great_, but it does massively help Before: > 111 passed, 1 warning in 439.37s (0:07:19) After: > 111 passed, 1 warning in 71.76s (0:01:11)
4dff904
to
08748ba
Compare
Follow up after apache#45690 and apache#45682 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. While testing it, I also discovered that after apache#45682 all kinds of exceptions when DAGBag parsed the example dags were silently ignored - they were just logged to the output and swallowed. This means that one of the purpose of example_dags - to catch accidental import errors and typos were not really fulfilled, because any exceptions during parsing would not be surfaced. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing"..
Follow up after apache#45690 and apache#45682 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
Follow up after apache#45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
Follow up after apache#45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
As part of "clearing the path" - here is #45704 as a follow up to prevent it in the future. With one caveat .... the test would not have caught the error in this case because The added test will make it quite obvious though: when someone adds example_dag with
I hope this will be helpful to avoid such mistake in the future :) |
…che#45690) The DAG serialization tests load all of the example and system test DAGs, and there were two places that these tests opened connections at parse time resulting in loads of extra of test time. - The SystemTestContextBuilder was trying to fetch things from SSM. This was addressed by adding a functools.cache on the function - The Bedrock example dag was setting/caching the underlying conn object globally. This was addressed by making the Airflow connection a global, rather than the Bedrock conn. This fix is not _great_, but it does massively help Before: > 111 passed, 1 warning in 439.37s (0:07:19) After: > 111 passed, 1 warning in 71.76s (0:01:11)
Follow up after #45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
…rse time (#45690) The DAG serialization tests load all of the example and system test DAGs, and there were two places that these tests opened connections at parse time resulting in loads of extra of test time. - The SystemTestContextBuilder was trying to fetch things from SSM. This was addressed by adding a functools.cache on the function - The Bedrock example dag was setting/caching the underlying conn object globally. This was addressed by making the Airflow connection a global, rather than the Bedrock conn. This fix is not _great_, but it does massively help Before: > 111 passed, 1 warning in 439.37s (0:07:19) After: > 111 passed, 1 warning in 71.76s (0:01:11) (cherry picked from commit 102e853) Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
…rse time (#45690) (#45826) The DAG serialization tests load all of the example and system test DAGs, and there were two places that these tests opened connections at parse time resulting in loads of extra of test time. - The SystemTestContextBuilder was trying to fetch things from SSM. This was addressed by adding a functools.cache on the function - The Bedrock example dag was setting/caching the underlying conn object globally. This was addressed by making the Airflow connection a global, rather than the Bedrock conn. This fix is not _great_, but it does massively help Before: > 111 passed, 1 warning in 439.37s (0:07:19) After: > 111 passed, 1 warning in 71.76s (0:01:11) (cherry picked from commit 102e853) Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
BTW. There was no need to backport this one - provider tests are skipped on v2-10-test. |
…che#45690) The DAG serialization tests load all of the example and system test DAGs, and there were two places that these tests opened connections at parse time resulting in loads of extra of test time. - The SystemTestContextBuilder was trying to fetch things from SSM. This was addressed by adding a functools.cache on the function - The Bedrock example dag was setting/caching the underlying conn object globally. This was addressed by making the Airflow connection a global, rather than the Bedrock conn. This fix is not _great_, but it does massively help Before: > 111 passed, 1 warning in 439.37s (0:07:19) After: > 111 passed, 1 warning in 71.76s (0:01:11)
Follow up after apache#45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
…che#45690) The DAG serialization tests load all of the example and system test DAGs, and there were two places that these tests opened connections at parse time resulting in loads of extra of test time. - The SystemTestContextBuilder was trying to fetch things from SSM. This was addressed by adding a functools.cache on the function - The Bedrock example dag was setting/caching the underlying conn object globally. This was addressed by making the Airflow connection a global, rather than the Bedrock conn. This fix is not _great_, but it does massively help Before: > 111 passed, 1 warning in 439.37s (0:07:19) After: > 111 passed, 1 warning in 71.76s (0:01:11)
Follow up after apache#45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
The DAG serialization tests load all of the example and system test DAGs, and
there were two places that these tests opened connections at parse time
resulting in loads of extra of test time.
addressed by adding a functools.cache on the function
globally. This was addressed by making the Airflow connection a global,
rather than the Bedrock conn. This fix is not great, but it does massively
help
Before:
After: