-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Resolve OOM When Reading Large Logs in Webserver] Refactor to Use K-Way Merge for Log Streams Instead of Sorting Entire Log Records #45129
base: main
Are you sure you want to change the base?
Conversation
ed334f7
to
1d0e6ed
Compare
Rebased after we fixed main issue |
1d0e6ed
to
8617e5b
Compare
0f19a8b
to
ef3450b
Compare
CI is failing due to: Since the |
Applied and closed/reopened to trigger the build |
ef3450b
to
0aaf0ab
Compare
Fix the provider tests that explicitly use the |
3aac539
to
46e30e3
Compare
46e30e3
to
1802ed1
Compare
Finally fixed the tests! This is the first (and likely the largest) PR for resolving OOM issues when reading large logs in the webserver. Even though the providers haven't yet been refactored to support stream-based log reading, the compatibility utility will transform the old For the testing part: |
1802ed1
to
091407e
Compare
It only really waits for your review/approval and it solves real issue. |
2b5c620
to
dbecf19
Compare
- refactor _read and read methods in FileTaskHandler - refactor read_log_chunks method in TaskLogReader
- add check log_stream type utils - fix type checking for - test_file_task_handler_when_ti_value_is_invalid - test_file_task_handler - test_file_task_handler_running - test_file_task_handler_rotate_size_limit - test__read_when_local - test__read_served_logs_checked_when_done_and_no_local_or_remote_logs - also test compatible interface for test__read_served_logs_checked_when_done_and_no_local_or_remote_logs - which might call _read_remote_logs
- Since read_log_chunks is public method, refactor it as same return type in orignial implementation to avoid breaking change - The `host` should only show once in read_log_stream
- Fix mock_read to new stream-based reading in test_log_reader - Fix test for expecting stdout of callable should be in log lines
- Logs might not be add to heap in first round, should consider log_streams instead of heap
- Make it compatible for providers that implemented _read method.
- Should handle input list is empty.
- Copy old test case that use read or _read methods - Add mark_test_for_stream_based_read_log_method and mark_test_for_old_read_log_method to skip corresponding CI tests
dbecf19
to
c6b29a8
Compare
Hi @potiuk, may I ask whether this PR is considered a refactor for 3.0 or 2.10? I saw the 3.0 feature freeze mentioned in the dev list, so I’m not sure which version this PR will be counted for. Here is the related discussion on Slack: https://apache-airflow.slack.com/archives/CCZRF2U5A/p1736767159693839 |
related: #45079
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.