-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track GCE/K8s server error #2097
Conversation
Hi @sawsa307. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
62bc4e8
to
98f175b
Compare
/assign @swetharepakula |
/ok-to-test |
98f175b
to
50ee81c
Compare
24d82fe
to
8acdbf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these errors are returned at a central point, should we just emit the metric then?
8acdbf3
to
7deab1f
Compare
55df15e
to
d3d2a02
Compare
d3d2a02
to
b25fe05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The errors in transaction.go that result in a failed sync should not be an ignored error
pkg/neg/syncers/transaction.go
Outdated
@@ -573,6 +573,7 @@ func (s *transactionSyncer) commitTransaction(err error, networkEndpointMap map[ | |||
// This is to prevent if the NEG object is deleted or misconfigured by user | |||
s.needInit = true | |||
needRetry = true | |||
metrics.PublishNegControllerErrorCountMetrics(err, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this error already being counted? This would be counted where commitTransaction occurs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should count the metrics at the end of the sync instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should count the metrics at the end of the sync instead?
We collect error from sync in syncer.go where we called s.core.sync(). Here we are collecting errors from each goroutine spawned by attach/detach calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking instead of operating on a error that is passed in, to do it where it occurs. However in this case, this is where the error is handled so I am okay with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be counted as ignored. An error here causes triggers a retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
Understood. So the definition for |
b25fe05
to
2a5ba46
Compare
pkg/neg/metrics/metrics.go
Outdated
ignoredControllerError = "ignored_controller_error" | ||
otherControllerError = "other_controller_error" | ||
totalNegControllerError = "total_neg_controller_error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
@@ -186,6 +187,7 @@ func (d *Migrator) Continue(err error) { | |||
} | |||
|
|||
if err != nil { | |||
metrics.PublishNegControllerErrorCountMetrics(err, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error will be passed into commitTransaction as well so it will be tracked there. It does not need to be tracked here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
pkg/neg/syncers/transaction.go
Outdated
@@ -573,6 +573,7 @@ func (s *transactionSyncer) commitTransaction(err error, networkEndpointMap map[ | |||
// This is to prevent if the NEG object is deleted or misconfigured by user | |||
s.needInit = true | |||
needRetry = true | |||
metrics.PublishNegControllerErrorCountMetrics(err, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking instead of operating on a error that is passed in, to do it where it occurs. However in this case, this is where the error is handled so I am okay with this.
pkg/neg/syncers/transaction.go
Outdated
@@ -573,6 +573,7 @@ func (s *transactionSyncer) commitTransaction(err error, networkEndpointMap map[ | |||
// This is to prevent if the NEG object is deleted or misconfigured by user | |||
s.needInit = true | |||
needRetry = true | |||
metrics.PublishNegControllerErrorCountMetrics(err, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be counted as ignored. An error here causes triggers a retry.
Add NegControllerErrorCount metrics to track the count of all errors from NEG controller, and counts of server errors from GCE/K8s.
2a5ba46
to
24283c1
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sawsa307, swetharepakula The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
} | ||
for { | ||
if apiErr, ok := err.(*googleapi.Error); ok { | ||
return apiErr.Code >= http.StatusInternalServerError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we want to check in the range [500, 599] not >= 500
} | ||
for { | ||
if apiErr, ok := err.(*k8serrors.StatusError); ok { | ||
return apiErr.ErrStatus.Code >= http.StatusInternalServerError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we want to check in the range [500, 599] not >= 500
Define
NegControllerErrorCount
that tracks both internal sync error and API server errors(GCE/K8s).