AIP-51 - Executor Coupling in Logging #28161

snjypl · 2022-12-06T14:11:58Z

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/utils/log/file_task_handler.py

airflow/executors/base_executor.py

airflow/utils/log/file_task_handler.py

tests/utils/test_log_handlers.py

…-in-Logging

…ging

o-nikolas

Marking my review as request for changes regarding unit testing (see here)

…ging

…-in-Logging

…ging

o-nikolas · 2023-01-16T20:32:43Z

Anyone have some time to give this a second review/approval? Would be nice to get this merged for @snjypl
Maybe @potiuk, @eladkal or @pierrejeambrun?

…ging

…thub.com:snjypl/airflow into bugfix/27931-AIP-51-Executor-Coupling-in-Logging

…ging

snjypl · 2023-01-18T20:50:37Z

@potiuk @eladkal @pierrejeambrun will be great if you could review this PR whenever you get a chance !

eladkal · 2023-01-18T22:31:42Z

airflow/executors/kubernetes_executor.py

+            if not pod_list:
+                raise RuntimeError("Cannot find pod for ti %s", ti)
+            elif len(pod_list) > 1:
+                raise RuntimeError("Found multiple pods for ti %s: %s", ti, pod_list)


Not really part of this PR but feels like the right place to ask.

Why do we raise these exceptions and not write the issue to the log and return it? (Like lines 285-287)

except Exception as f: log += f"*** Unable to fetch logs from worker pod {ti.hostname} ***\n{str(f)}\n\n" return log, {"end_of_log": True}

I wonder if this is the reason users sometimes don't see the task log and it makes them harder to find the root cause like in #29025 ?

@eladkal i think, #29025 is more about the error that we log around these part.

airflow/airflow/executors/kubernetes_executor.py

Lines 690 to 714 in 1e385ac

# These codes indicate something is wrong with pod definition; otherwise we assume pod

# definition is ok, and that retrying may work

if e.status in (400, 422):

self.log.error("Pod creation failed with reason %r. Failing task", e.reason)

key, _, _, _ = task

self.change_state(key, State.FAILED, e)

else:

self.log.warning(

"ApiException when attempting to run task, re-queueing. Reason: %r. Message: %s",

e.reason,

json.loads(e.body)["message"],

)

self.task_queue.put(task)

except PodMutationHookException as e:

key, _, _, _ = task

self.log.error(

"Pod Mutation Hook failed for the task %s. Failing task. Details: %s",

key,

e.__cause__,

)

self.fail(key, e)

finally:

self.task_queue.task_done()

except Empty:

.

These logs i believe are part of the scheduler logs and won't be visible as part of the task's log since we only fetch the logs from task's k8s pod in kubernetes_executor.get_task_log.

regarding the exceptions, am not sure if i understand you correctly, but i think, those exceptions are caught by the enclosing try/except and returned to the user.

…ging

potiuk

LGTM. @dstandish Maybe you can also take a look since you are working on this area (from different angle though - triggerers).

pierrejeambrun

Small nit, otherwise LGTM

pierrejeambrun · 2023-01-23T00:44:53Z

airflow/utils/log/file_task_handler.py


-                for line in res:
-                    log += line.decode()
+            if hasattr(executor, "get_task_log"):


Why do we need this check ? I think it only helps for custom executor that are not BaseExecutor, but other PR removed such check I believe.

thanks @pierrejeambrun i went through the discussion #28276 (comment) . i have removed the hasattr check.

…P-51-Executor-Coupling-in-Logging

…ging

…thub.com:snjypl/airflow into bugfix/27931-AIP-51-Executor-Coupling-in-Logging

XD-DENG · 2023-02-02T04:58:38Z

airflow/executors/kubernetes_executor.py

+            elif len(pod_list) > 1:
+                raise RuntimeError("Found multiple pods for ti %s: %s", ti, pod_list)
+            res = client.read_namespaced_pod_log(
+                name=pod_list[0].metadata.name,


Checking this part of code: why do we need to do the works above to get the pod name? The ti.hostname is just the pod name, isn't it?

cc @o-nikolas @snjypl

This is not code that is new to this PR. It was just moved to a different location. If you see the airflow/utils/log/file_task_handler.py module, this code existed there before these changes.

initial refactoring

6a67d03

snjypl requested review from dstandish, jedcunningham, kaxil, XD-DENG and ashb as code owners December 6, 2022 14:11

boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:logging area:Scheduler including HA (high availability) scheduler labels Dec 6, 2022

snjypl changed the title ~~AIP-51 - Executor Coupling in Logging~~ WIP AIP-51 - Executor Coupling in Logging Dec 6, 2022

snjypl marked this pull request as draft December 6, 2022 19:24

snjypl mentioned this pull request Dec 6, 2022

AIP-51 - Executor Coupling in Logging #27931

Closed

o-nikolas requested changes Dec 8, 2022

View reviewed changes

airflow/utils/log/file_task_handler.py Show resolved Hide resolved

airflow/executors/base_executor.py Outdated Show resolved Hide resolved

reverted refactoring

ac8b149

snjypl closed this Dec 9, 2022

snjypl force-pushed the bugfix/27931-AIP-51-Executor-Coupling-in-Logging branch from ac8b149 to d8a0658 Compare December 9, 2022 17:01

add get_task_log in local and celery kubernetes executor

d02287f

snjypl reopened this Dec 9, 2022

sanjay pillai added 2 commits December 9, 2022 23:03

add type hint

ec66e41

fixing unittest

baf3361

snjypl marked this pull request as ready for review December 9, 2022 22:53

o-nikolas requested changes Dec 10, 2022

View reviewed changes

airflow/utils/log/file_task_handler.py Outdated Show resolved Hide resolved

airflow/utils/log/file_task_handler.py Show resolved Hide resolved

airflow/utils/log/file_task_handler.py Show resolved Hide resolved

tests/utils/test_log_handlers.py Show resolved Hide resolved

Refactored file_task_handler

4d2ba5e

snjypl force-pushed the bugfix/27931-AIP-51-Executor-Coupling-in-Logging branch from 74641a4 to 4d2ba5e Compare December 10, 2022 13:07

snjypl requested a review from eladkal as a code owner December 10, 2022 13:07

snjypl added 2 commits December 10, 2022 07:08

Merge branch 'apache:main' into bugfix/27931-AIP-51-Executor-Coupling…

0eeb2a0

…-in-Logging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

e1a6851

…ging

o-nikolas requested changes Dec 12, 2022

View reviewed changes

snjypl added 2 commits December 13, 2022 07:02

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

828ce16

…ging

Merge branch 'apache:main' into bugfix/27931-AIP-51-Executor-Coupling…

20ac872

…-in-Logging

o-nikolas requested a review from pierrejeambrun January 9, 2023 19:11

snjypl added 6 commits January 10, 2023 04:16

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

6366087

…ging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

3937816

…ging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

a8e2bef

…ging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

1be69b1

…ging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

d5b7196

…ging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

07dd5e4

…ging

snjypl added 6 commits January 16, 2023 15:41

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

31ae849

…ging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

9119b9d

…ging

fix merge conflict

71ea32f

Merge branch 'bugfix/27931-AIP-51-Executor-Coupling-in-Logging' of gi…

89a514b

…thub.com:snjypl/airflow into bugfix/27931-AIP-51-Executor-Coupling-in-Logging

fix merge conflict

f31b2b3

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

1e385ac

…ging

eladkal reviewed Jan 18, 2023

View reviewed changes

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

5d669f5

…ging

potiuk approved these changes Jan 21, 2023

View reviewed changes

pierrejeambrun approved these changes Jan 23, 2023

View reviewed changes

snjypl added 4 commits January 23, 2023 12:36

Merge branch 'main' of github.com:apache/airflow into bugfix/27931-AI…

8c6f26f

…P-51-Executor-Coupling-in-Logging

Merge branch 'main' into bugfix/27931-AIP-51-Executor-Coupling-in-Log…

0c8943e

…ging

Merge branch 'bugfix/27931-AIP-51-Executor-Coupling-in-Logging' of gi…

7d317e5

…thub.com:snjypl/airflow into bugfix/27931-AIP-51-Executor-Coupling-in-Logging

removed hasattr check

2ae5144

o-nikolas merged commit 3b25168 into apache:main Jan 24, 2023

snjypl deleted the bugfix/27931-AIP-51-Executor-Coupling-in-Logging branch January 24, 2023 21:00

XD-DENG reviewed Feb 2, 2023

View reviewed changes

pierrejeambrun added the AIP-51 AIP-51: Remove executor coupling from Core label Feb 27, 2023

pierrejeambrun added this to the Airflow 2.6.0 milestone Feb 27, 2023

pierrejeambrun added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 27, 2023

pierrejeambrun mentioned this pull request Mar 23, 2023

Fix Unable to fetch logs from worker pod error in UI for k8s executor #28817

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIP-51 - Executor Coupling in Logging #28161

AIP-51 - Executor Coupling in Logging #28161

snjypl commented Dec 6, 2022

o-nikolas left a comment

o-nikolas commented Jan 16, 2023

snjypl commented Jan 18, 2023

eladkal Jan 18, 2023

snjypl Jan 19, 2023 •

edited

Loading

potiuk left a comment

pierrejeambrun left a comment

pierrejeambrun Jan 23, 2023

snjypl Jan 23, 2023

XD-DENG Feb 2, 2023

o-nikolas Feb 2, 2023


	# These codes indicate something is wrong with pod definition; otherwise we assume pod
	# definition is ok, and that retrying may work
	if e.status in (400, 422):
	self.log.error("Pod creation failed with reason %r. Failing task", e.reason)
	key, _, _, _ = task
	self.change_state(key, State.FAILED, e)
	else:
	self.log.warning(
	"ApiException when attempting to run task, re-queueing. Reason: %r. Message: %s",
	e.reason,
	json.loads(e.body)["message"],
	)
	self.task_queue.put(task)
	except PodMutationHookException as e:
	key, _, _, _ = task
	self.log.error(
	"Pod Mutation Hook failed for the task %s. Failing task. Details: %s",
	key,
	e.__cause__,
	)
	self.fail(key, e)
	finally:
	self.task_queue.task_done()
	except Empty:

AIP-51 - Executor Coupling in Logging #28161

AIP-51 - Executor Coupling in Logging #28161

Conversation

snjypl commented Dec 6, 2022

o-nikolas left a comment

Choose a reason for hiding this comment

o-nikolas commented Jan 16, 2023

snjypl commented Jan 18, 2023

eladkal Jan 18, 2023

Choose a reason for hiding this comment

snjypl Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

potiuk left a comment

Choose a reason for hiding this comment

pierrejeambrun left a comment

Choose a reason for hiding this comment

pierrejeambrun Jan 23, 2023

Choose a reason for hiding this comment

snjypl Jan 23, 2023

Choose a reason for hiding this comment

XD-DENG Feb 2, 2023

Choose a reason for hiding this comment

o-nikolas Feb 2, 2023

Choose a reason for hiding this comment

snjypl Jan 19, 2023 •

edited

Loading