-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore stability and unquarantine all test_scheduler_job tests #19860
Restore stability and unquarantine all test_scheduler_job tests #19860
Conversation
16f174e
to
a8fcdc0
Compare
a8fcdc0
to
9cf5055
Compare
9cf5055
to
2438009
Compare
Running "full" tests, to get a few runs and check flakiness (locally I run those 100s of times - but seems that some scenarios might only happen in parallel run scenarios). |
2438009
to
f444f47
Compare
f444f47
to
53a1c74
Compare
|
Yep. Seems it's still 'flaky`... looking at it. |
ec3a4e8
to
9be6803
Compare
I could not reproduce it locally at all, but I added an extra flush(). I believe the problem was because the queries might have not been yet flushed after ORM modifications (but I am a bit guessing here). I will try to run it more times. |
9be6803
to
d65807d
Compare
Hmm. @ashb @uranusjr @ephraimbuddy maybe you could take a look at this one and help me to figure out what happened. I have a theory and I am trying it in the latest commit, but I just debugged and learned more about how dag_maker works internally (pretty cool actually). Below you will see coloured dump pf the failing Here is my line of thinking: What I seee there is that when the "create_dagrun" loop for the When (normally) the test runs successfully - as expected there are no error messages, DagRuns are created and life is good. So clearly the reason for the test failin are those 'Couldn't find dag'. But from what I understand how get_dag() works, the only way to get For whatever reason when the tests get to the My theory is that for some reason the second run of the dag_maker used different session from sqlalchemy. This is the only way I could explain this behaviour, but I am not sure how it could happen. Yes the dag_makers did not have session passed to them - which triggers (own_session) - so each of the dag_maker contexts would create new Session(). But at least in theory they should be the same scope. From what I understand sqlalchemy stores the actuall session in Threadlocal and they will be reused if Session() is called twice. But maybe I do not understand SqlAlchemy session behaviour and there is way same thread will get two different sessions (with different ORM models) if it calls Session() twice. I am trying out my theory by adding a change where the two dag_makers, actually share session created before. I moved session creation up and passed it to dag_makers. Previously those tests failed in 1/2 tests consistently before so I hope we will see the result of it soon. But If it would confirm, then I wonder under what circumstances this could happen. |
Hmm. Interesting one - it did not fix the problem (so my hypothesis was wrong): https://github.com/apache/airflow/runs/4357877756?check_suite_focus=true#step:6:4065 Also we have another flaky test here: test_scheduler_task_start_date https://github.com/apache/airflow/runs/4357877606?check_suite_focus=true#step:6:4013 |
f965207
to
c5220b7
Compare
OK. I have very interesting findings and I might need some brainstorming. @ashb @ephraimbuddy @uranusjr. I think know what is causing the problem wut but I have not figured out the mechanism (yet). I am running some experiments to get more data but the problem with This seems to be caused by a deffered re-importing of Some Observations: This problem is accompanied always by this log entry (which is generated by reloading settings by dag processor I believe):
This entry asynchronously appears at various places - sometimes during the test, sometimes in
I see three interesting cases: Reload happening after or around second DAG syncing - here only d2 was missing:
Reload happening before DAG syncing (here both dags are missing). This is VERY interesting because it seems that the reload affects creation of the SerializedDagModels created after the reload - and not retrieval of them later.
Reload happening between the dags syncing (here again d2 was missing):
It looks like it is not connected with exactly the moment of the log, but I believe it is caused with what happens next after the "Timezone" log is logged. When I look at the code, right after re-importing the settings, DagProcessor calls It looks literally as if SerializedDAGModels created by
However I failed to come up with a theory why such reload in a spawned process might cause this really? And why the spawned DagProcessor is not killed by this in pytest fixture ?
Any help and brainstorming appreciated. |
All green with test_scheduler_job :). Closing/reopenig with |
This one looks really good - both public runners and self-hosted do not seem to be flaky at least the few times I run full test suite). I think it is ready to merge. |
Seems like for our test the time of flaky pandemic is close to be over and our quarantined test number will go down ;). Wish that was true in real life. After that one is merged, those are the only left quanantined tests (looking at reviewing/fixing them after this one gets merged).
|
Running once again - all tests were ok except the mssql - which is unrelated problem we have to solve that happens occasionally with Public runner. |
Close/reopen to check again (and again). |
The MSSQL failures should be handled by #20231 |
4 x integration test failures (those that #20231 should handle nicely). Running again, but seems those tests are pretty stable. now. |
Yep . Run it many times now. Only unrelated errors (flaky k8s test for public runners this time). Shall we? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks right to me.
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Let's get it run one more time :) |
* Restore stability and unquarantine all test_scheduler_job tests The scheduler job tests were pretty flaky and some of them were quarantined already (especially the query count). This PR improves the stability in the following ways: * clean the database between tests for TestSchedulerJob to avoid side effects * forces UTC timezone in tests where date missed timezone specs * updates number of queries expected in the query count tests * stabilizes the sequence of retrieval of tasks in case tests depended on it * adds more stack trace levels (5) to compare where extra methods were called. * increase number of scheduler runs where it was needed * add session.flush() where it was missing * add requirement to have serialized dags ready when needed * increase dagruns number to process where we could have some "too slow" tests comparing to fast processing of dag runs. Hopefully: * Fixes: #18777 * Fixes: #17291 * Fixes: #17224 * Fixes: #15255 * Fixes: #15085 * Update tests/jobs/test_scheduler_job.py Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit 9b277db)
* Restore stability and unquarantine all test_scheduler_job tests The scheduler job tests were pretty flaky and some of them were quarantined already (especially the query count). This PR improves the stability in the following ways: * clean the database between tests for TestSchedulerJob to avoid side effects * forces UTC timezone in tests where date missed timezone specs * updates number of queries expected in the query count tests * stabilizes the sequence of retrieval of tasks in case tests depended on it * adds more stack trace levels (5) to compare where extra methods were called. * increase number of scheduler runs where it was needed * add session.flush() where it was missing * add requirement to have serialized dags ready when needed * increase dagruns number to process where we could have some "too slow" tests comparing to fast processing of dag runs. Hopefully: * Fixes: #18777 * Fixes: #17291 * Fixes: #17224 * Fixes: #15255 * Fixes: #15085 * Update tests/jobs/test_scheduler_job.py Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit 9b277db)
The scheduler job tests were pretty flaky and some of them were
quarantined already (especially the query count). This PR improves
the stability in the following ways:
side effects
depended on it
methods were called.
Hopefully:
(Based on #20054 to test if all green).
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.