-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for sync result #1911
Add metrics for sync result #1911
Conversation
Hi @sawsa307. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @swetharepakula |
2b4b322
to
16db295
Compare
/ok-to-test |
d2af05f
to
b36169a
Compare
} | ||
|
||
// UpdateSyncer updates the count of sync results based on the result/error of sync | ||
func (im *SyncerMetrics) UpdateSyncer(key negtypes.NegSyncerKey, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe sm or just m?
if err == nil { | ||
im.countSinceLastExport[Success] += 1 | ||
} else { | ||
syncErr := errors.Unwrap(err).(syncError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we verify that the error is of type syncError?
Or can we guarantee that this is the type of error always?
im.mu.Lock() | ||
defer im.mu.Unlock() | ||
if im.countSinceLastExport == nil { | ||
klog.Fatalf("Syncer Metrics failed to initialize correctly, countSinceLastExport: %v", im.countSinceLastExport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fatal will cause the binary to exit. should we just log the error and initialize here?
pkg/neg/metrics/sync_errors.go
Outdated
@@ -0,0 +1,86 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these errors should be in the syncer code that will return this errors.
pkg/neg/metrics/sync_errors.go
Outdated
@@ -0,0 +1,86 @@ | |||
/* | |||
Copyright 2020 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2023
pkg/flags/flags.go
Outdated
@@ -256,7 +257,8 @@ L7 load balancing. CSV values accepted. Example: -node-port-ranges=80,8080,400-5 | |||
flag.BoolVar(&F.EnableL4ILBDualStack, "enable-l4ilb-dual-stack", false, "Enable Dual-Stack handling for L4 Internal Load Balancers") | |||
flag.BoolVar(&F.EnableMultipleIGs, "enable-multiple-igs", false, "Enable using multiple unmanaged instance groups") | |||
flag.IntVar(&F.MaxIGSize, "max-ig-size", 1000, "Max number of instances in Instance Group") | |||
flag.DurationVar(&F.MetricsExportInterval, "metrics-export-interval", 10*time.Minute, `Period for calculating and exporting metrics related to state of managed objects.`) | |||
flag.DurationVar(&F.UsageMetricsExportInterval, "usage-metrics-export-interval", 10*time.Minute, `Period for calculating and exporting metrics related to state of managed objects.`) | |||
flag.DurationVar(&F.SyncMetricsExportInterval, "sync-metrics-export-interval", 5*time.Second, `Period for calculating and exporting metrics related to state of syncer.`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe neg-metrics-export-interval
In the description, add that these are neg controller internal metrics, not usage.
cmd/glbc/main.go
Outdated
@@ -329,6 +329,7 @@ func runControllers(ctx *ingctx.ControllerContext) { | |||
ctx.SvcNegInformer, | |||
ctx.HasSynced, | |||
ctx.ControllerMetrics, | |||
ctx.SyncerMetrics, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since moving to the controller, we don't nee to add to context.
pkg/neg/syncers/transaction.go
Outdated
@@ -217,7 +217,7 @@ func (s *transactionSyncer) syncInternalImpl() error { | |||
|
|||
currentMap, err := retrieveExistingZoneNetworkEndpointMap(s.NegSyncerKey.NegName, s.zoneGetter, s.cloud, s.NegSyncerKey.GetAPIVersion(), s.endpointsCalculator.Mode()) | |||
if err != nil { | |||
return err | |||
return fmt.Errorf("%w, reason: %v", metrics.ErrCurrentEPNotFound, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This slightly obscures what is the error we are returning. Typically we wrap the error we received instead of wrapping the context we are returning.
Implement a result instead where you can provide the reason. Look at what was done in L4lb:
ingress-gce/pkg/loadbalancers/l4.go
Line 62 in 0f312a7
type L4ILBSyncResult struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
// NewNEGMetricsCollector initializes SyncerMetrics and starts a go routine to compute and export metrics periodically. | ||
func NewNegMetricsCollector(exportInterval time.Duration) *SyncerMetrics { | ||
return &SyncerMetrics{ | ||
countSinceLastExport: map[syncError]int{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a counter, we don't need to store this state. This will be necessary only when we have gauge metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
b36169a
to
bb394ad
Compare
bb394ad
to
3b2b1c7
Compare
b14be83
to
4d3b024
Compare
4d3b024
to
f5b9857
Compare
pkg/neg/syncers/transaction.go
Outdated
} | ||
s.logEndpoints(addEndpoints, "adding endpoint") | ||
s.logEndpoints(removeEndpoints, "removing endpoint") | ||
|
||
return s.syncNetworkEndpoints(addEndpoints, removeEndpoints) | ||
err = s.syncNetworkEndpoints(addEndpoints, removeEndpoints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just make syncNetworkEndpoints return the sync result?
48bc2e8
to
b64514b
Compare
pkg/neg/syncers/transaction.go
Outdated
// | ||
// 1. The endpoint count from endpointData doesn't equal to the one from endpointPodMap: | ||
// endpiontPodMap removes the duplicated endpoints, and dupCount stores the number of duplicated it removed | ||
// and we compare the endpoint counts with duplicates | ||
// 2. The endpoint count from endpointData or the one from endpointPodMap is 0 | ||
func (s *transactionSyncer) isValidEndpointInfo(eds []negtypes.EndpointsData, endpointPodMap negtypes.EndpointPodMap, dupCount int) (bool, string) { | ||
func (s *transactionSyncer) checkValidEndpointInfo(eds []negtypes.EndpointsData, endpointPodMap negtypes.EndpointPodMap, dupCount int) (bool, *negtypes.NegSyncResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitL checkEndpointInfo or validateEndpointInfo
pkg/neg/syncers/transaction.go
Outdated
if errors.Is(err, ErrEPMissingNodeName) { | ||
// checkValidEPField checks if endpoints have valid field | ||
// It return the result boolean with the corresponding reason | ||
func (s *transactionSyncer) checkValidEPField(err error) (bool, *negtypes.NegSyncResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: checkEPField or validateEPField
pkg/neg/syncers/transaction.go
Outdated
errorState := s.errorState | ||
if errorState == negtypes.ResultInvalidAPIResponse { | ||
syncResult = negtypes.NewNegSyncResult(negtypes.ErrInvalidAPIResponse, errorState) | ||
} | ||
if errorState == negtypes.ResultInvalidEPAttach { | ||
syncResult = negtypes.NewNegSyncResult(negtypes.ErrInvalidEPAttach, errorState) | ||
} | ||
if errorState == negtypes.ResultInvalidEPDetach { | ||
syncResult = negtypes.NewNegSyncResult(negtypes.ErrInvalidEPDetach, errorState) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use a switch and make the default case do the last one
b64514b
to
72c4296
Compare
pkg/neg/syncers/transaction.go
Outdated
defer s.syncLock.Unlock() | ||
|
||
var syncResult *negtypes.NegSyncResult | ||
if s.inErrorState() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to check for error state here? We just need the syncResult, so there should be a nil case and the detach case added to the switch.
72c4296
to
9933f59
Compare
pkg/neg/syncers/transaction_test.go
Outdated
if got, reason := transactionSyncer.isValidEndpointInfo(tc.endpointsData, tc.endpointPodMap, tc.dupCount); got != tc.expect && reason != tc.expectedReason { | ||
t.Errorf("invalidEndpointInfo() = %t, expected %t", got, tc.expect) | ||
if got, result := transactionSyncer.CheckEndpointInfo(tc.endpointsData, tc.endpointPodMap, tc.dupCount); got != tc.expect || result.Result != tc.expectedResult.Result || !errors.Is(result.Error, tc.expectedResult.Error) { | ||
t.Errorf("CheckEndpointInfo() = %t|%s|%t, expected %t|%s|%t", got, result.Result, result.Error, tc.expect, tc.expectedResult.Result, tc.expectedResult.Error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do got
then wanted
or expected
based on go style guide.
Break it up a little more so that it is easier to understand the test and debug in the future:
got, result := transactionSyncer.CheckEndpointInfo(tc.endpointsData, tc.endpointPodMap, tc.dupCount)
if got != tc.expect {
t.Errorf(...)
}
if result.Result != tc.expectedResult {
t.Errorf(...)
}
same for the other test.
9933f59
to
dd0242e
Compare
Added metrics to collect the cumulative count of sync results.
dd0242e
to
ff4751f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sawsa307, swetharepakula The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Created sync metrics collector to collect sync related metrics. Added metrics to collect the cumulative count of sync results. Specified metrics emit interval in flags.go.