Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup_ci needs to use different expiration times for auto deployed and E2E clusters #512

Merged
merged 2 commits into from
Nov 1, 2019

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Nov 1, 2019

  • The auto deployed clusters are now using unique names rather than being
    recycled and we rely on cleanup_ci.py to GC old auto-deployments (Auto-deployments of Kubeflow should use unique names and not recycle names - Upgrade to v0.7 #444)

  • To support this we need to use variable expiration times.

    • Deployments created by tests should expire as soon as the test is done (so 1-2 hours)

    • But auto-deployed clusters need to live longer

      • There are only refreshed periodically by a cron job (~8 hours) we don't
        want to delete the cluster before a new one is deployed because we need
        a cluster for the example tests

      • We want to leave the clusters up as long as we can to facilitate debugging
        by people working on example tests.

Related to #444


This change is Reviewable

@jlewi jlewi changed the title cleanup_ci needs to use different expiration times for auto deployed and E2E clusters [WIP] cleanup_ci needs to use different expiration times for auto deployed and E2E clusters Nov 1, 2019
…and E2E clusters

* The auto deployed clusters are now using unique names rather than being
  recycled and we rely on cleanup_ci.py to GC old auto-deployments (kubeflow#444)

* To support this we need to use variable expiration times.

  * Deployments created by tests should expire as soon as the test is done (so 1-2 hours)

  * But auto-deployed clusters need to live longer

    * There are only refreshed periodically by a cron job (~8 hours) we don't
      want to delete the cluster before a new one is deployed because we need
      a cluster for the example tests

    * We want to leave the clusters up as long as we can to facilitate debugging
      by people working on example tests.

Related to kubeflow#444
@jlewi jlewi changed the title [WIP] cleanup_ci needs to use different expiration times for auto deployed and E2E clusters cleanup_ci needs to use different expiration times for auto deployed and E2E clusters Nov 1, 2019
@jlewi
Copy link
Contributor Author

jlewi commented Nov 1, 2019

/assign @zhenghuiwang

@@ -694,7 +739,7 @@ def cleanup_service_accounts(args):
now = datetime.datetime.now(valid_time.tzinfo)

age = now - valid_time
if age < datetime.timedelta(hours=args.max_age_hours):
if age < MAX_LIFETIME[a["email"]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to use infra_type instead of a["email"]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@@ -1015,17 +1068,6 @@ def main():
help=("""Whether to GC backend services that are older
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on this flag needs some change since it relates to --max_ci_deployment_resource_age_hours

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jlewi
Copy link
Contributor Author

jlewi commented Nov 1, 2019

Thanks @zhenghuiwang PTAL

@zhenghuiwang
Copy link
Contributor

/lgtm
/approve
thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhenghuiwang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 71a3dcd into kubeflow:master Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants