-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log: noderesourcetopology: per-flow additional log #289
log: noderesourcetopology: per-flow additional log #289
Conversation
Welcome @fromanirh! |
Hi @fromanirh. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold |
@fromanirh: GitHub didn't allow me to request PR reviews from the following users: Tal-or, AlexeyPerevalov. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
31d9cf3
to
d1a55fa
Compare
@fromanirh the motivation sounds reasonable to me. Have you talked with sig-instrumentation about promoting it to be a generic klog primitive? |
not yet, but I plan to. |
c3f7aee
to
a78f080
Compare
a78f080
to
387b10c
Compare
/hold cancel |
conversation with sig-instrumentation in progress. I'm reviewing |
/ok-to-test |
/approve Will leave /lgtm to @Tal-or @swatisehgal @AlexeyPerevalov. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fromanirh, Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
It would be nice to have a unit test for this new feature. The Added-value of the unit-test is also an easy way to demonstrate this feature's functionalities. One more thing - IIUC in order to activate this feature (besides having the annotation) we need to have Besides that, it looks ok to me but I don't have lgtm permissions, so I leave it to the reviewers. |
I am working on a KEP for contextual logging that I intend to propose for Kubernetes 1.24. That KEP will also address kubernetes/kubernetes#91633 (comment) With contextual logging, what you are proposing here can be solved without adding more semantic to each log call. It's based on the idea that a logger is passed into a function and then the function does all logging with that logger. The code that starts processing a pod then can:
|
387b10c
to
d075b6f
Compare
This probably was pointed out already, but that has security implications. It allows a normal user of a cluster to increase logging of a system component. Probably it won't enable a denial of service attack, but it's worth calling out. |
awesome news! this sounds super interesting and seems to solve my use case brilliantly! I'll be following your KEP, thanks for pointing this out! Considering this information maybe @Huang-Wei wants to revisit the approval. |
Indeed. I mentioned in the CAVEATs section of the commit message, which in retrospect I should probably must have made more evident. I'm curious to learn how you will handle this factor in your KEP! |
/hold |
The KEP is just about the logging infrastructure that would enable this use case here. The scheduler would still need to look at pod attributes or some additional configuration to determine which pods are important and which are less important. So the security problem remains open. |
Given @pohly is working on a generic solution that's endorsed by k/k, I think we can wait and adopt once it's live. @fromanirh WDYT? |
Well, not endorsed yet and even once it is, it will take time to get it working completely. Help with both (KEP review/approval and the technical work) will be very welcome. But I agree, let's first discuss that KEP in the 1.24 time frame before introducing specialized solutions. |
I surely agree that the general solution @pohly is working on is a better approach long term (and in general). I look forward to the KEP and to see the code, and I'd love to help in this effort however I can. For the specific need of this scheduler plugin, however, I still do see a value in this PR, because:
My take is that this PR can serve these purposes. The challanges are:
TL;DR: if we label this feature as experimental, limit to this specific plugin, it has still enough value because it allows the project to gather feedback about how to enable users to consume contextual logging (which UX is good and which is not) and allow us to actually have better logging in the short term until we transition to the long term solution. I volunteer to remove this code at that point in time. |
+1. There is definitely value in this feature in the short-term and explicitly stating that it is an experimental feature sounds like a good approach to me. How do we want to make it explicit that this is an experimental capability? Might be worth having that in the annotation name ( |
d075b6f
to
29f4db0
Compare
29f4db0
to
ef69c8e
Compare
All the following applies first and foremost to the NodeTopologyMatch plugin, but can be easily generalized. Troubleshooting the behaviour of the components requires almost always increasing the log level to peek into the flow. This is due by the fact that, by default, components want to run with concise logging to avoid log churn and spam. But increasing the log level very often requires a configuration change and a component reboot, which is nontrivial to do. More than that, restarting requires changing the cluster state, which can make torubleshooting harder, or just longer, while we reproduce the state. More than that, the log level is a global, or in the best case scenario a per-module setting. On the other hand, we are often interested in the behaviour of a specific flow, possibly across components, rather than the behaviour of some components. Increasing the log level then creates unnecessary and uninteresting logs, which can actually make harder to track the flow. It would thus be nice to have the option to dynamically enable detailed logging per-flow across components, avoiding component restart. I'm not aware of loggers package, including klog, implementing this feature, but we can have a pretty close approximation with this PR. We add a new annotation: pods can opt-in including this annotation in the new behaviour. This allows extra verbosity only on selected cases (pods). We add utility function to tune the verbosiness of the logs with the aforementioned annotation, overriding the default verbosiness level. IOW, log messages can be emitted if *either* - the component (/module) verbosity is high enough OR - a pod opts in adding the special annotation. Finally, we consume this new feature in the NodeTopologyMatch plugin, to both help troubleshooting and to demonstrate its usage. CAVEATs: - the feature cannot be turned off (it should probably deserve a global disable flag?) - potentially malicious actors can trigger extra logging adding unnecessary annotations. - but these actors need to be able to send pods to the scheduler, so they have already means to overload the component and/or cause extra logging. Not sure this is a practical concern, highligthing for full transparency. Signed-off-by: Francesco Romani <fromani@redhat.com>
ef69c8e
to
d3ab779
Compare
I have published a PR with the KEP and intend to discuss it at this week's SIG Instrumentation meeting: kubernetes/enhancements#3078 The first step will be to decide whether the SIG wants to tackle this problem. I've used this issue as basis for one of the use cases in the KEP. |
@fromanirh: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
considering the ehnancements planned for the next kube cycle and the concerns emerged, I'm closing this PR. I'll be proposing more fine grained logs (to address the same need this PR tackles, but from a different angle) with a later PR. |
All the following applies first and foremost to the
NodeTopologyMatch plugin, but can be easily generalized.
Troubleshooting the behaviour of the components requires almost
always increasing the log level to peek into the flow.
This is due by the fact that, by default, components want to run
with concise logging to avoid log churn and spam.
But increasing the log level very often requires a configuration
change and a component reboot, which is nontrivial to do.
More than that, restarting requires changing the cluster state,
which can make torubleshooting harder, or just longer, while we
reproduce the state.
More than that, the log level is a global, or in the best case
scenario a per-module setting. On the other hand, we are often
interested in the behaviour of a specific flow, possibly across
components, rather than the behaviour of some components.
Increasing the log level then creates unnecessary and uninteresting
logs, which can actually make harder to track the flow.
It would thus be nice to have the option to dynamically enable
detailed logging per-flow across components, avoiding component restart.
I'm not aware of loggers package, including klog, implementing this
feature, but we can have a pretty close approximation with this PR.
We add a new annotation: pods can opt-in including this annotation
in the new behaviour. This allows extra verbosity only on selected
cases (pods).
We add utility function to tune the verbosiness of the logs with
the aforementioned annotation, overriding the default verbosiness level.
IOW, log messages can be emitted if either
OR
Finally, we consume this new feature in the NodeTopologyMatch plugin,
to both help troubleshooting and to demonstrate its usage.
CAVEATs:
disable flag?)
unnecessary annotations.
so they have already means to overload the component and/or cause
extra logging. Not sure this is a practical concern, highligthing
for full transparency.
Signed-off-by: Francesco Romani fromani@redhat.com