Skip to content
This repository has been archived by the owner on Jan 13, 2021. It is now read-only.

Add horizontal pod autoscaler support #314

Merged
merged 4 commits into from
Sep 18, 2018

Conversation

mcoot
Copy link
Contributor

@mcoot mcoot commented Sep 17, 2018

PR for adding Smith support for the built-in K8s HorizontalPodAutoscaler objects.

Adds:

  • Readiness checking function for HPAs
  • Adds informer for HPAs to the Bundle Controller

@mcoot mcoot requested review from ash2k, nilebox and amckague September 17, 2018 03:16
case core_v1.ConditionTrue:
foundScalingActive = true
case core_v1.ConditionFalse:
// It's expected that it will take a little while to get metrics
Copy link
Contributor Author

@mcoot mcoot Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would specifically like review on this. Checking whether a HPA has failed or is still progressing does not seem trivial - you can get the same status with the same reason in both cases.

Here I use a timeout, which I am not fully happy with but I haven't thought of another way to have it actually determine failure in the case where the HPA comes up and can never access metrics.

Edit: Also I'll probably up the timelimit a bit before considering merging...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, we can always change it later. Extract that timeout into a constant and add a comment please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mcoot mcoot requested a review from Burr September 17, 2018 03:19
scV1B1Scheme = runtime.NewScheme()
appsV1Scheme = runtime.NewScheme()
scV1B1Scheme = runtime.NewScheme()
autoscalingV2B1 = runtime.NewScheme()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be called autoscalingV2B1Scheme to follow the established pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add new lines to separate sections to avoid the more horrific alignment shifts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"
)

const (
hpaAbleToScaleCondition = autoscaling_v2b1.HorizontalPodAutoscalerConditionType("AbleToScale")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that these constants are only used in the isHorizontalPodAutoscalerReady, I would consider declaring them inside this method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, these constants are already declared in the autoscaling package: autoscaling_v2b1.AbleToScale and autoscaling_v2b1.ScalingActive respectively. no need to declare them here at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

foundAbleToScale = true
case core_v1.ConditionFalse:
// AbleToScale should not be false if the HPA is working, this is a failure
return false, false, errors.Errorf("%s: %s", cond.Reason, cond.Message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ash2k is that correct to return a non-retriable error in that case?

	// AbleToScale indicates a lack of transient issues which prevent scaling from occurring,
	// such as being in a backoff window, or being unable to access/update the target scale.
	AbleToScale HorizontalPodAutoscalerConditionType = "AbleToScale"

In case of backoff this error is recoverable, and updating the target scale could succeed after retrying...
It doesn't affect the lifecycle of the bundle AFAICT, it's mostly about reporting InProgress: True with error Reason: RetriableError vs InProgress: False and error Reason: TerminalError.

Copy link
Contributor

@ash2k ash2k Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. We need to refactor error propagation in Smith. You are correct, that it influences the reason smith reports. But it also retries to re-process things if the error is marked as retriable, which will not help in this case (but is also not very harmful, only extra logs). So perhaps in this case the error should be marked as retriable to be future compatible with what we will hopefully do to surface such conditions. See #276

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I also had a thought conflict between correct status and Smith indefinitely stuck in retry loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made these retriable.

// If it's been stuck in this condition for >3min we assume it's failed
now := meta_v1.Now()
if cond.LastTransitionTime.Add(3 * time.Minute).Before(now.Time) {
return false, false, errors.Errorf("%s: %s", cond.Reason, cond.Message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems like a recoverable error?
e.g. if the autoscale controller was down and now is up again, it will be able to scale.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have made this retriable.

Gopkg.lock Outdated
@@ -3,18 +3,15 @@

[[projects]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this file need to change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcoot make sure you have the latest dep version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

scV1B1Scheme = runtime.NewScheme()
appsV1Scheme = runtime.NewScheme()
scV1B1Scheme = runtime.NewScheme()
autoscalingV2B1 = runtime.NewScheme()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add new lines to separate sections to avoid the more horrific alignment shifts

@ash2k ash2k changed the title Jspearritt/add horizontal pod autoscaler support Add horizontal pod autoscaler support Sep 17, 2018
@ash2k
Copy link
Contributor

ash2k commented Sep 17, 2018

👍

p.s. squash merge please

@mcoot mcoot merged commit 4f70301 into master Sep 18, 2018
@mcoot mcoot deleted the jspearritt/add-horizontal-pod-autoscaler-support branch September 18, 2018 04:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants