-
Notifications
You must be signed in to change notification settings - Fork 24
Add horizontal pod autoscaler support #314
Add horizontal pod autoscaler support #314
Conversation
case core_v1.ConditionTrue: | ||
foundScalingActive = true | ||
case core_v1.ConditionFalse: | ||
// It's expected that it will take a little while to get metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would specifically like review on this. Checking whether a HPA has failed or is still progressing does not seem trivial - you can get the same status with the same reason in both cases.
Here I use a timeout, which I am not fully happy with but I haven't thought of another way to have it actually determine failure in the case where the HPA comes up and can never access metrics.
Edit: Also I'll probably up the timelimit a bit before considering merging...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, we can always change it later. Extract that timeout into a constant and add a comment please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/readychecker/types/built_in.go
Outdated
scV1B1Scheme = runtime.NewScheme() | ||
appsV1Scheme = runtime.NewScheme() | ||
scV1B1Scheme = runtime.NewScheme() | ||
autoscalingV2B1 = runtime.NewScheme() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be called autoscalingV2B1Scheme
to follow the established pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add new lines to separate sections to avoid the more horrific alignment shifts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/readychecker/types/built_in.go
Outdated
"k8s.io/apimachinery/pkg/runtime" | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
) | ||
|
||
const ( | ||
hpaAbleToScaleCondition = autoscaling_v2b1.HorizontalPodAutoscalerConditionType("AbleToScale") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that these constants are only used in the isHorizontalPodAutoscalerReady
, I would consider declaring them inside this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, these constants are already declared in the autoscaling package: autoscaling_v2b1.AbleToScale
and autoscaling_v2b1.ScalingActive
respectively. no need to declare them here at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/readychecker/types/built_in.go
Outdated
foundAbleToScale = true | ||
case core_v1.ConditionFalse: | ||
// AbleToScale should not be false if the HPA is working, this is a failure | ||
return false, false, errors.Errorf("%s: %s", cond.Reason, cond.Message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ash2k is that correct to return a non-retriable error in that case?
// AbleToScale indicates a lack of transient issues which prevent scaling from occurring,
// such as being in a backoff window, or being unable to access/update the target scale.
AbleToScale HorizontalPodAutoscalerConditionType = "AbleToScale"
In case of backoff this error is recoverable, and updating the target scale could succeed after retrying...
It doesn't affect the lifecycle of the bundle AFAICT, it's mostly about reporting InProgress: True
with error Reason: RetriableError
vs InProgress: False
and error Reason: TerminalError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. We need to refactor error propagation in Smith. You are correct, that it influences the reason smith reports. But it also retries to re-process things if the error is marked as retriable, which will not help in this case (but is also not very harmful, only extra logs). So perhaps in this case the error should be marked as retriable to be future compatible with what we will hopefully do to surface such conditions. See #276
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I also had a thought conflict between correct status and Smith indefinitely stuck in retry loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made these retriable.
pkg/readychecker/types/built_in.go
Outdated
// If it's been stuck in this condition for >3min we assume it's failed | ||
now := meta_v1.Now() | ||
if cond.LastTransitionTime.Add(3 * time.Minute).Before(now.Time) { | ||
return false, false, errors.Errorf("%s: %s", cond.Reason, cond.Message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems like a recoverable error?
e.g. if the autoscale controller was down and now is up again, it will be able to scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have made this retriable.
Gopkg.lock
Outdated
@@ -3,18 +3,15 @@ | |||
|
|||
[[projects]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this file need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcoot make sure you have the latest dep
version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
pkg/readychecker/types/built_in.go
Outdated
scV1B1Scheme = runtime.NewScheme() | ||
appsV1Scheme = runtime.NewScheme() | ||
scV1B1Scheme = runtime.NewScheme() | ||
autoscalingV2B1 = runtime.NewScheme() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add new lines to separate sections to avoid the more horrific alignment shifts
👍 p.s. squash merge please |
PR for adding Smith support for the built-in K8s HorizontalPodAutoscaler objects.
Adds: