Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-51 - Executor Coupling in Logging #28161

Merged

Conversation

snjypl
Copy link
Contributor

@snjypl snjypl commented Dec 6, 2022

Fixes: #27931


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:logging area:Scheduler including HA (high availability) scheduler labels Dec 6, 2022
@snjypl snjypl changed the title AIP-51 - Executor Coupling in Logging WIP AIP-51 - Executor Coupling in Logging Dec 6, 2022
@snjypl snjypl marked this pull request as draft December 6, 2022 19:24
@snjypl snjypl closed this Dec 9, 2022
@snjypl snjypl force-pushed the bugfix/27931-AIP-51-Executor-Coupling-in-Logging branch from ac8b149 to d8a0658 Compare December 9, 2022 17:01
@snjypl snjypl reopened this Dec 9, 2022
@snjypl snjypl marked this pull request as ready for review December 9, 2022 22:53
@snjypl snjypl force-pushed the bugfix/27931-AIP-51-Executor-Coupling-in-Logging branch from 74641a4 to 4d2ba5e Compare December 10, 2022 13:07
@snjypl snjypl requested a review from eladkal as a code owner December 10, 2022 13:07
Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking my review as request for changes regarding unit testing (see here)

@o-nikolas
Copy link
Contributor

Anyone have some time to give this a second review/approval? Would be nice to get this merged for @snjypl
Maybe @potiuk, @eladkal or @pierrejeambrun?

@snjypl
Copy link
Contributor Author

snjypl commented Jan 18, 2023

@potiuk @eladkal @pierrejeambrun will be great if you could review this PR whenever you get a chance !

Comment on lines +807 to +810
if not pod_list:
raise RuntimeError("Cannot find pod for ti %s", ti)
elif len(pod_list) > 1:
raise RuntimeError("Found multiple pods for ti %s: %s", ti, pod_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really part of this PR but feels like the right place to ask.

Why do we raise these exceptions and not write the issue to the log and return it? (Like lines 285-287)

        except Exception as f:
            log += f"*** Unable to fetch logs from worker pod {ti.hostname} ***\n{str(f)}\n\n"
            return log, {"end_of_log": True}

I wonder if this is the reason users sometimes don't see the task log and it makes them harder to find the root cause like in #29025 ?

Copy link
Contributor Author

@snjypl snjypl Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal i think, #29025 is more about the error that we log around these part.

# These codes indicate something is wrong with pod definition; otherwise we assume pod
# definition is ok, and that retrying may work
if e.status in (400, 422):
self.log.error("Pod creation failed with reason %r. Failing task", e.reason)
key, _, _, _ = task
self.change_state(key, State.FAILED, e)
else:
self.log.warning(
"ApiException when attempting to run task, re-queueing. Reason: %r. Message: %s",
e.reason,
json.loads(e.body)["message"],
)
self.task_queue.put(task)
except PodMutationHookException as e:
key, _, _, _ = task
self.log.error(
"Pod Mutation Hook failed for the task %s. Failing task. Details: %s",
key,
e.__cause__,
)
self.fail(key, e)
finally:
self.task_queue.task_done()
except Empty:
.

These logs i believe are part of the scheduler logs and won't be visible as part of the task's log since we only fetch the logs from task's k8s pod in kubernetes_executor.get_task_log.

regarding the exceptions, am not sure if i understand you correctly, but i think, those exceptions are caught by the enclosing try/except and returned to the user.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @dstandish Maybe you can also take a look since you are working on this area (from different angle though - triggerers).

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, otherwise LGTM


for line in res:
log += line.decode()
if hasattr(executor, "get_task_log"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this check ? I think it only helps for custom executor that are not BaseExecutor, but other PR removed such check I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pierrejeambrun i went through the discussion #28276 (comment) . i have removed the hasattr check.

@o-nikolas o-nikolas merged commit 3b25168 into apache:main Jan 24, 2023
@snjypl snjypl deleted the bugfix/27931-AIP-51-Executor-Coupling-in-Logging branch January 24, 2023 21:00
elif len(pod_list) > 1:
raise RuntimeError("Found multiple pods for ti %s: %s", ti, pod_list)
res = client.read_namespaced_pod_log(
name=pod_list[0].metadata.name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking this part of code: why do we need to do the works above to get the pod name? The ti.hostname is just the pod name, isn't it?

cc @o-nikolas @snjypl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not code that is new to this PR. It was just moved to a different location. If you see the airflow/utils/log/file_task_handler.py module, this code existed there before these changes.

@pierrejeambrun pierrejeambrun added the AIP-51 AIP-51: Remove executor coupling from Core label Feb 27, 2023
@pierrejeambrun pierrejeambrun added this to the Airflow 2.6.0 milestone Feb 27, 2023
@pierrejeambrun pierrejeambrun added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-51 AIP-51: Remove executor coupling from Core area:logging area:Scheduler including HA (high availability) scheduler changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:cncf-kubernetes Kubernetes provider related issues
Development

Successfully merging this pull request may close these issues.

AIP-51 - Executor Coupling in Logging
7 participants