-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 2258: Node service log viewer #2271
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Welcome @aravindhp! |
Hi @aravindhp. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @immuzz |
I signed it |
/sig node |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/check-cla |
I signed it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great enhancement! I realize it's WIP but had a few early comments.
- Windows worker nodes (all supported variants) | ||
|
||
### Non-Goals | ||
Providing support for non-systemd Linux distributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is obvious but maybe you can point out here that reporting logs for nodes/kubelets that have config/connection issues with the cluster is out-of-scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
systemd / journald should return "OS not supported". | ||
|
||
### Windows | ||
Reuse the kubelet API for querying the Linux journal for invoking the `Get-WinEvent` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows kubelets may log to just a regular file. Would there be some logic here to figure out whether the log type configured and stream them out based on the configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've planned for any detection logic for this case.
One thing to note is that if the kubelet (or any other process) logs to C:\var\log\somepath
on Windows, those log files can be streamed by the kubelet's existing log server, and one part of this proposal is implementing a client command for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddebroy I have sort of addressed this in my next commit. PTAL.
- "@LorbusChris" | ||
owning-sig: sig-windows | ||
participating-sigs: | ||
- sig-node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the kubectl enhancements, I guess sig-cli needs to be involved as well and needs to approve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want someone form sig-auth to approve here as well.
At least from the windows side exposing the ability to query arbitrary event logs can expose a lot more system information than what is exposed in kubelet logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added sig-auth
to participating sigs.
/sig cli |
Does Kubelet know about the node? Yes <--- Does Kubelet own the machine? No For OpenShift all of the answers turned out to be yes, and so it was a win to add something like this (because we explicitly took choice about node configuration away). I don't think that's universal. I don't think this endpoint should be required for conformance. Would most Kube distros have this, find value in it, and lead to happier and healthy clusters, without requiring us to solve completely? Probably. I remember us having this same discussion about whether pod logs should be viewable, but we had the bar of Docker log to contend with, so we just went ahead and did it. Same for exec - exec was so useful we decided not to fallback on SSH and just put exec in. I can't imagine Kubernetes without pod logs and pod exec. |
Agree, and I think it would be a win for most installations. I am just trying to find the edges of it.
I think I agree with that. My only remaining concern is forcing users to know and spell out files vs journals - can we find some way to help that? |
@aravindhp Thank you for driving this KEP. This is one of the most asked feature by Windows Containers customers for scenarios like gMSA where they have to individually log into each node to view logs just to set it up. I have reviewed the KEP thoroughly and have asked Windows engineers to review as well. Look good to us. This will be very useful in making K8s more managed. Cant wait to try this and happy to help along the way! Thanks again! |
... FWIW we'd take advantage of this if it was available :) |
@thockin, I can't think of an easy way to do this with making assumptions about the service and how it would log. Moreover given this feature is restricted to cluster admins and is advanced, won't having knowledge of whether they should asking for a file or journal be fine? Especially given they need this knowledge today when they have to log into the node and inspect it. |
Why can't this be an abstraction? If I ask for Why push this on users? |
This was the assumption I was hesitant to make i.e. a service named Also services like |
To me, the biggest value that this enhancement adds to K8s is that it makes it possible to grab logs off nodes without needed to remotely connect to said nodes - this is especially important for Windows where configuring the nodes for a remote access is more complicated than adding SSH keys to a file. Today in order to debug Windows nodes users need to first enable remote access (which is usually done by interacting with vendor specific infrastructure to manipulate the node) and then start looking for the logs they want to inspect.
Something like this may help but I have a small concern that this could increase complexity and cause some hinderances. I wouldn't be opposed to adding in some heuristics or something to help in finding logs but do think it would be important to let users get at the exact logs they are looking for and/or handle cases like
|
Yeah, I am saying that, on average, I expect this sort of a heuristic to work pretty well. If we want to add more specific manual overrides, those are the exception, rather than the norm.
Yay glog. Those are additive - the INFO file is inclusive of the ERROR file. If we want to serve both through this API, maybe we have an extra (optional) arg that factors into the heuristic. And if I am wrong that the heuristic approach works, then I will waive further objections. So how can we prove whether it does or doesn't work? |
Yay glog indeed 😃 In fact there are three files including the WARNING file which is also included in the INFO file. So in heuristic approach by default we only server the INFO file or do we serve all files starting with
One way would be to call out the heuristic approach in the KEP. Then implement the feature without heuristics for alpah and get feedback from the users. Alternatively implement the feature with the heuristics approach and without the journal vs file options for alpha and get feedback from the users. Then react to the feedback appropriately when moving from alpha to beta. I am open to other options too. |
as a heuristic, if we see 3 such files, we can assume they are glog output and choose INFO. If there's really demand for only error messages, we can expand the heuristic. My fear is that if we implement with a lot of control we will have a hard time removing them, whereas if we start with relatively little control, we may not need them at all. I should caveat - this is armchair architecture. From my distance I can handwave about it, but I don't have to implement all the crazy suggestions. It may be that I am asking for dumb things, and I just need people to show me why. |
Co-authored-by: Christian Glombek <cglombek@redhat.com>
@thockin Thank you for the patient feedback. I have updated the KEP to reflect the heuristic approach you suggested. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty happy - small nits only from me
/lgtm
|
||
Options: | ||
--case-sensitive=true: Filters are case sensitive by default. Pass --case-sensitive=false to do a case insensitive filter. | ||
-g, --grep='': Filter log entries by the provided regex pattern. Only applies to node journal logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do some flags only work on some implementations? Can we at least clarify somewhere that "some capabilities are optional, and may not be supported by some sources of log information", rather than specifically "journal"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a note to call this out during the implementation. Would that suffice or should I open a follow up PR to amend the enhancement?
--raw=false: Perform no transformation of the returned data. | ||
--role='': Set a label selector by node role. | ||
-l, --selector='': Selector (label query) to filter on. | ||
--since='': Return logs after a specific ISO timestamp or relative date. Only applies to node journal or Get-WinEvent logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for these
to get logs from these, it will attempt to get logs from `/var/log/foobar.log`, | ||
`/var/log/foobar/foobar.log`, `/var/log/foobar*INFO` or | ||
`/var/log/foobar/foobar*INFO` in that order. | ||
Here are some examples and explanation of the options that will be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also say that the heuristic can grow and evolve as we get real feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank your for the review, @thockin
|
||
Options: | ||
--case-sensitive=true: Filters are case sensitive by default. Pass --case-sensitive=false to do a case insensitive filter. | ||
-g, --grep='': Filter log entries by the provided regex pattern. Only applies to node journal logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a note to call this out during the implementation. Would that suffice or should I open a follow up PR to amend the enhancement?
to get logs from these, it will attempt to get logs from `/var/log/foobar.log`, | ||
`/var/log/foobar/foobar.log`, `/var/log/foobar*INFO` or | ||
`/var/log/foobar/foobar*INFO` in that order. | ||
Here are some examples and explanation of the options that will be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Hello @aravindhp 👋, 1.22 Docs Shadow here. Please follow the steps detailed in the documentation to open a PR against dev-1.22 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Fri July 9, 11:59 PM PDT. Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release. Thank you! 🙏 |
KEP to introduce kubectl option for viewing logs of system services on Windows and Linux nodes.
Co-authored-by: Christian Glombek cglombek@redhat.com