Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SHIPA-2322] adds wait-retry to helm update & delete #233

Merged
merged 4 commits into from
Feb 11, 2022
Merged

Conversation

stinkyfingers
Copy link
Contributor

@stinkyfingers stinkyfingers commented Feb 8, 2022

Description

2322
Helm update and helm delete currently fail when the object status (app) is not deployed. This change checks a map for actionable & retryable statuses for updates and deletions. For wait-retry statuses, a wait-retry loop is entered, and ultimately the status is manually updated to deployed.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Chore (documentation addition or typo, file relocation)

Testing

  • New tests were added with this PR that prove my fix is effective or that my feature works (describe below this bullet)
  • This change requires no testing (i.e. documentation update)

Documentation

  • All added public packages, funcs, and types have been documented with doc comments
  • I have commented my code, particularly in hard-to-understand areas

Final Checklist:

  • I followed standard GitHub flow guidelines
  • I have performed a self-review of my own code
  • My changes generate no new warnings

type statusFunc func(cfg *action.Configuration, appName string) (*release.Release, release.Status, error)

const (
WaitRetry = iota
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like there is no need to export WaitRetry, TakeAction and NoAction.
Is it possible to use waitRetry, takeAction and noAction correspondingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bet. Privatized!


// helmStatusActionMapUpdate maps a Release Status to a Ketch action for helm updates
var helmStatusActionMapUpdate = map[release.Status]int{
"not-found": TakeAction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a new const for not-found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

func (c HelmClient) waitForActionableStatus(statusFunc statusFunc, appName string, statusActionMap map[release.Status]int) (bool, error) {
ticker := time.NewTicker(statusRetryInterval)
done := time.After(statusRetryTimeout)
var helmRelease *release.Release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, but will it block the appReconciler's loop?
if yes, what are the negative consequences we are going to deal with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our slack discussion: I removed the wait loop. Now, "wait-retry" statuses throw an error, which the reconciler loop is expected to handle. This means that 1) we don't manually update a chart's status here (not sure if that's good or bad) 2) there is a possiblity that a chart will get stuck in a weird status like pending-uninstall and the reconciler will just keep re-trying. Not sure if that's something to be concerned about.

return false, nil
case takeAction:
return true, nil
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it help if we implement something like

if lastRelease.Info.FirstDeployed.Before(helmTime.Time{Time: timeoutLimit}) {
newStatus := release.StatusDeployed
c.log.Info(fmt.Sprintf("Setting status of release that has timeouted to: %s", newStatus))
lastRelease.SetStatus(newStatus, "manually canceled")
if err := c.cfg.Releases.Update(lastRelease); err != nil {
return nil, err
}
}

i

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. This PR's addition almost duplicates that FirstDeployed.Before check, but considers additional statuses. I removed the original block of code.

Copy link
Contributor

@aleksej-paschenko aleksej-paschenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks amazing!

@stinkyfingers stinkyfingers merged commit a6830cd into main Feb 11, 2022
@stinkyfingers stinkyfingers deleted the shipa-2322 branch February 11, 2022 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants