-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: structured, contextual logging #639
Conversation
a3d1606
to
6f3e4f8
Compare
go.mod
Outdated
@@ -61,3 +61,6 @@ replace k8s.io/component-base => k8s.io/component-base v0.21.0 | |||
replace k8s.io/component-helpers => k8s.io/component-helpers v0.21.0 | |||
|
|||
replace k8s.io/csi-translation-lib => k8s.io/csi-translation-lib v0.21.0 | |||
|
|||
// WIP | |||
replace k8s.io/klog/v2 => github.com/pohly/klog/v2 v2.4.1-0.20210527141230-ac596814502c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the plan for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I proposed to have the code in klog: kubernetes/klog#240
It's currently on hold because the logr API changes need to be dealt with first.
|
||
klog.Info("Started Capacity Controller") | ||
logger.Info("started controller") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for starting an info log message with lower case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a full sentence, therefore initial capital letter looked odd. I don't know whether there is some guidance on this.
It's also consistent with error messages. For those the official guidance is to start with lower case because the error might get wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see if the structured log KEP has any official guidance on this. The official guidance for error message is only referring to the returned error message, not error messages in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logr example uses lower case, incidentally also with a "starting" message:
https://github.com/go-logr/logr#typical-usage
But the Kubernetes documentation says "Start from a capital letter": https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#remove-string-formatting-from-log-message
It also has some other recommendations. Once we agree to go further with this and the klog PR is merged, I'll revisit the log messages and update them accordingly.
|
||
klog.V(3).Infof("Capacity Controller: storage class %s was removed", sc.Name) | ||
logger.V(3).Info("removed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show an example of the output of this changed log msg vs the original log msg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I have a coverage gap in capacity_test.go - onSCDelete is not called. Will fix.
In the meantime, here's the corresponding output from onSCAddedOrUpdated, as printed by go test
. With these changes:
capacity.go:373: INFO onSCAddOrUpdate: updated or added storageclass="triple-sc"
...
capacity.go:481: INFO onSCAddOrUpdate: enqueuing storageclass="triple-sc" workitem={segment:0x27ef430 storageClassName:triple-sc}
Without them:
I0608 17:48:40.721547 866697 capacity.go:361] Capacity Controller: storage class triple-sc was updated or added
...
I0608 17:48:40.721526 866697 capacity.go:468] Capacity Controller: enqueuing {segment:0x27e7350 storageClassName:triple-sc}
Note that the "enqueuing" messages without this PR lacks context. It's not clear why addWorkItem
was called. With contextual logging, the onSCAddOrUpdate
function name and the storage class get passed down and are added to the log message.
This is important once things start to happen in parallel. When everything is sequential, one can read the log from top to bottom and remember which values were logged earlier. But when run in parallel, it's not clear whether log output actually follows the one printed directly above it.
This is a problem in our driver logs where it is hard to associate a gRPC error response with the corresponding call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the log output for a complete testcase:
=== RUN TestRefresh
=== RUN TestRefresh/truncated_topology
=== PAUSE TestRefresh/truncated_topology
=== CONT TestRefresh/truncated_topology
capacity.go:338: INFO onTopologyChanges: topology changed added=[0x27ef430 = layer0: foo+ layer1: X+ layer2: A 0x27ef450 = layer0: foo+ layer1: X+ layer2: B] removed=[]
capacity.go:373: INFO onSCAddOrUpdate: updated or added storageclass="direct-sc"
capacity.go:481: INFO onSCAddOrUpdate: enqueuing storageclass="direct-sc" workitem={segment:0x27ef430 storageClassName:direct-sc}
capacity.go:481: INFO onSCAddOrUpdate: enqueuing storageclass="direct-sc" workitem={segment:0x27ef450 storageClassName:direct-sc}
capacity.go:373: INFO onSCAddOrUpdate: updated or added storageclass="triple-sc"
capacity.go:481: INFO onSCAddOrUpdate: enqueuing storageclass="triple-sc" workitem={segment:0x27ef430 storageClassName:triple-sc}
capacity.go:481: INFO onSCAddOrUpdate: enqueuing storageclass="triple-sc" workitem={segment:0x27ef450 storageClassName:triple-sc}
capacity.go:338: INFO onTopologyChanges: topology changed added=[0x27ef430 = layer0: foo+ layer1: X+ layer2: A 0x27ef450 = layer0: foo+ layer1: X+ layer2: B] removed=[]
capacity.go:481: INFO onTopologyChanges: enqueuing workitem={segment:0x27ef430 storageClassName:direct-sc}
capacity.go:481: INFO onTopologyChanges: enqueuing workitem={segment:0x27ef450 storageClassName:direct-sc}
capacity.go:481: INFO onTopologyChanges: enqueuing workitem={segment:0x27ef430 storageClassName:triple-sc}
capacity.go:481: INFO onTopologyChanges: enqueuing workitem={segment:0x27ef450 storageClassName:triple-sc}
capacity.go:276: INFO prepare: initial state topology segments=2 storage classes=2 potential CSIStorageCapacity objects=4
capacity.go:288: INFO prepare: checking for existing CSIStorageCapacity objects
--- PASS: TestRefresh (0.00s)
Note that this would be impossible to do without this PR because log output from different test cases would be mixed.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The k8s.io/component-base/logs API is used to add several new command line flags and the corresponding implementation: --feature-gates: ContextualLogging=true|false (ALPHA - default=true) LoggingAlphaOptions=true|false (ALPHA - default=false) LoggingBetaOptions=true|false (BETA - default=true) --log-flush-frequency duration Maximum number of seconds between log flushes (default 5s) --log-json-info-buffer-size quantity [Alpha] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). Enable the LoggingAlphaOptions feature gate to use this. --log-json-split-stream [Alpha] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. Enable the LoggingAlphaOptions feature gate to use this. 35a42 --logging-format string Sets the log format. Permitted formats: "json" (gated by LoggingBetaOptions), "text". (default "text") In contrast to the defaults in the (pretty conservative) Kubernetes components, contextual logging gets enabled by default. That has the advantage that code can be rewritten with the assumption that WithValue and WithName calls really have an effect. Users can still disable the feature, but logs will be less informative in that case.
a2eef0f
to
a2d6a4d
Compare
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten I was looking for a volunteer to continue with this, but so far without luck. I'll probably finish this myself. |
Hi @pohly
Would it be acceptable if I were to take on this development task? |
@bells17: help with this would be very welcome. Feel free to take my branch, rebase it and continue in a new PR. I think with this PR and kubernetes-csi/node-driver-registrar#259 it is technically clear how to use component-base/logs. The rest of the conversion can go as described in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md |
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Embracing go-logr as logger has several advantages:
The latter was needed to debug #638
Does this PR introduce a user-facing change?: