-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup_ci needs to use different expiration times for auto deployed and E2E clusters #512
Conversation
…and E2E clusters * The auto deployed clusters are now using unique names rather than being recycled and we rely on cleanup_ci.py to GC old auto-deployments (kubeflow#444) * To support this we need to use variable expiration times. * Deployments created by tests should expire as soon as the test is done (so 1-2 hours) * But auto-deployed clusters need to live longer * There are only refreshed periodically by a cron job (~8 hours) we don't want to delete the cluster before a new one is deployed because we need a cluster for the example tests * We want to leave the clusters up as long as we can to facilitate debugging by people working on example tests. Related to kubeflow#444
/assign @zhenghuiwang |
py/kubeflow/testing/cleanup_ci.py
Outdated
@@ -694,7 +739,7 @@ def cleanup_service_accounts(args): | |||
now = datetime.datetime.now(valid_time.tzinfo) | |||
|
|||
age = now - valid_time | |||
if age < datetime.timedelta(hours=args.max_age_hours): | |||
if age < MAX_LIFETIME[a["email"]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to use infra_type
instead of a["email"]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
py/kubeflow/testing/cleanup_ci.py
Outdated
@@ -1015,17 +1068,6 @@ def main(): | |||
help=("""Whether to GC backend services that are older |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on this flag needs some change since it relates to --max_ci_deployment_resource_age_hours
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thanks @zhenghuiwang PTAL |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zhenghuiwang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The auto deployed clusters are now using unique names rather than being
recycled and we rely on cleanup_ci.py to GC old auto-deployments (Auto-deployments of Kubeflow should use unique names and not recycle names - Upgrade to v0.7 #444)
To support this we need to use variable expiration times.
Deployments created by tests should expire as soon as the test is done (so 1-2 hours)
But auto-deployed clusters need to live longer
There are only refreshed periodically by a cron job (~8 hours) we don't
want to delete the cluster before a new one is deployed because we need
a cluster for the example tests
We want to leave the clusters up as long as we can to facilitate debugging
by people working on example tests.
Related to #444
This change is