-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIP-51 - Executor Coupling in Logging #28161
AIP-51 - Executor Coupling in Logging #28161
Conversation
ac8b149
to
d8a0658
Compare
74641a4
to
4d2ba5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking my review as request for changes regarding unit testing (see here)
Anyone have some time to give this a second review/approval? Would be nice to get this merged for @snjypl |
…thub.com:snjypl/airflow into bugfix/27931-AIP-51-Executor-Coupling-in-Logging
@potiuk @eladkal @pierrejeambrun will be great if you could review this PR whenever you get a chance ! |
if not pod_list: | ||
raise RuntimeError("Cannot find pod for ti %s", ti) | ||
elif len(pod_list) > 1: | ||
raise RuntimeError("Found multiple pods for ti %s: %s", ti, pod_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really part of this PR but feels like the right place to ask.
Why do we raise these exceptions and not write the issue to the log and return it? (Like lines 285-287)
except Exception as f:
log += f"*** Unable to fetch logs from worker pod {ti.hostname} ***\n{str(f)}\n\n"
return log, {"end_of_log": True}
I wonder if this is the reason users sometimes don't see the task log and it makes them harder to find the root cause like in #29025 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eladkal i think, #29025 is more about the error that we log around these part.
airflow/airflow/executors/kubernetes_executor.py
Lines 690 to 714 in 1e385ac
# These codes indicate something is wrong with pod definition; otherwise we assume pod | |
# definition is ok, and that retrying may work | |
if e.status in (400, 422): | |
self.log.error("Pod creation failed with reason %r. Failing task", e.reason) | |
key, _, _, _ = task | |
self.change_state(key, State.FAILED, e) | |
else: | |
self.log.warning( | |
"ApiException when attempting to run task, re-queueing. Reason: %r. Message: %s", | |
e.reason, | |
json.loads(e.body)["message"], | |
) | |
self.task_queue.put(task) | |
except PodMutationHookException as e: | |
key, _, _, _ = task | |
self.log.error( | |
"Pod Mutation Hook failed for the task %s. Failing task. Details: %s", | |
key, | |
e.__cause__, | |
) | |
self.fail(key, e) | |
finally: | |
self.task_queue.task_done() | |
except Empty: |
These logs i believe are part of the scheduler logs and won't be visible as part of the task's log since we only fetch the logs from task's k8s pod in kubernetes_executor.get_task_log
.
regarding the exceptions, am not sure if i understand you correctly, but i think, those exceptions are caught by the enclosing try/except and returned to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @dstandish Maybe you can also take a look since you are working on this area (from different angle though - triggerers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, otherwise LGTM
|
||
for line in res: | ||
log += line.decode() | ||
if hasattr(executor, "get_task_log"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this check ? I think it only helps for custom executor that are not BaseExecutor
, but other PR removed such check I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @pierrejeambrun i went through the discussion #28276 (comment) . i have removed the hasattr
check.
…P-51-Executor-Coupling-in-Logging
…thub.com:snjypl/airflow into bugfix/27931-AIP-51-Executor-Coupling-in-Logging
elif len(pod_list) > 1: | ||
raise RuntimeError("Found multiple pods for ti %s: %s", ti, pod_list) | ||
res = client.read_namespaced_pod_log( | ||
name=pod_list[0].metadata.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking this part of code: why do we need to do the works above to get the pod name? The ti.hostname
is just the pod name, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not code that is new to this PR. It was just moved to a different location. If you see the airflow/utils/log/file_task_handler.py
module, this code existed there before these changes.
Fixes: #27931
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.