-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-deployments of Kubeflow should use unique names and not recycle names - Upgrade to v0.7 #444
Labels
Comments
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Oct 18, 2019
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Oct 18, 2019
Related to kubeflow#471 * Don't set name in the spec because we want to infer it form directory. * Create a new script to deploy with a unique name * Related to: kubeflow#444 * Update cleanup script to clean up new auto-deployed clusters
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Oct 23, 2019
Related to kubeflow#471 * Don't set name in the spec because we want to infer it form directory. * Create a new script to deploy with a unique name * Related to: kubeflow#444 * Update cleanup script to clean up new auto-deployed clusters
k8s-ci-robot
pushed a commit
that referenced
this issue
Oct 23, 2019
* Auto deploy job needs to use the new kfctl syntax; also use unique names Related to #471 * Don't set name in the spec because we want to infer it form directory. * Create a new script to deploy with a unique name * Related to: #444 * Update cleanup script to clean up new auto-deployed clusters * In cron job get code from master. * Fix lint. * Revert changes to create_kf_instance * update to v1beta1 spec. * * We need to use a self-signed certificate with the auto-deployed clusters because otherwise we hit lets-encrypt rate limiting.
The cron job is now failing.
|
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Oct 25, 2019
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Oct 25, 2019
We need to update To work with the new naming pattern. |
k8s-ci-robot
pushed a commit
that referenced
this issue
Oct 25, 2019
* Fix auto-deployment labels to be valid GCP labels. Related to: #444 * Update to use PR in oneoff. * Fix labels.
We need to update cleanup_ci.py - the max duration of clusters is shorter then the frequency with which we are auto-deploying clusters from master |
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Nov 1, 2019
…and E2E clusters * The auto deployed clusters are now using unique names rather than being recycled and we rely on cleanup_ci.py to GC old auto-deployments (kubeflow#444) * To support this we need to use variable expiration times. * Deployments created by tests should expire as soon as the test is done (so 1-2 hours) * But auto-deployed clusters need to live longer * There are only refreshed periodically by a cron job (~8 hours) we don't want to delete the cluster before a new one is deployed because we need a cluster for the example tests * We want to leave the clusters up as long as we can to facilitate debugging by people working on example tests. Related to kubeflow#444
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Nov 1, 2019
…and E2E clusters * The auto deployed clusters are now using unique names rather than being recycled and we rely on cleanup_ci.py to GC old auto-deployments (kubeflow#444) * To support this we need to use variable expiration times. * Deployments created by tests should expire as soon as the test is done (so 1-2 hours) * But auto-deployed clusters need to live longer * There are only refreshed periodically by a cron job (~8 hours) we don't want to delete the cluster before a new one is deployed because we need a cluster for the example tests * We want to leave the clusters up as long as we can to facilitate debugging by people working on example tests. Related to kubeflow#444
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Nov 1, 2019
* Auto deployed clusters are no longer recycling names; instead each auto deployed cluster will have a unique name * Use regexes to identify the appropriate auto deployed cluster * Only consider clusters with a minimum age; this is a hack to ensure clusters are properly setup. * Related to: kubeflow#444
k8s-ci-robot
pushed a commit
that referenced
this issue
Nov 1, 2019
…and E2E clusters (#512) * cleanup_ci needs to use different expiration times for auto deployed and E2E clusters * The auto deployed clusters are now using unique names rather than being recycled and we rely on cleanup_ci.py to GC old auto-deployments (#444) * To support this we need to use variable expiration times. * Deployments created by tests should expire as soon as the test is done (so 1-2 hours) * But auto-deployed clusters need to live longer * There are only refreshed periodically by a cron job (~8 hours) we don't want to delete the cluster before a new one is deployed because we need a cluster for the example tests * We want to leave the clusters up as long as we can to facilitate debugging by people working on example tests. Related to #444 * Address reviewer comments.
jlewi
pushed a commit
to jlewi/examples
that referenced
this issue
Nov 2, 2019
* The test wasn't actually running because we were passing arguments that were unknown to pytest * Remove the old role.yaml; we don't use it anymore * Wait for the Job to finish and properly report status; kubeflow/testing#514 contains the new routine * The test still isn't passing because of kubeflow#673 * In addition we need to fix the auto deployments kubeflow/testing#444 Related to kubeflow#665
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Nov 2, 2019
* Auto deployed clusters are no longer recycling names; instead each auto deployed cluster will have a unique name * Use regexes to identify the appropriate auto deployed cluster * Only consider clusters with a minimum age; this is a hack to ensure clusters are properly setup. * Related to: kubeflow#444
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Nov 4, 2019
* Auto deployed clusters are no longer recycling names; instead each auto deployed cluster will have a unique name * Use regexes to identify the appropriate auto deployed cluster * Only consider clusters with a minimum age; this is a hack to ensure clusters are properly setup. * Related to: kubeflow#444
k8s-ci-robot
pushed a commit
to kubeflow/examples
that referenced
this issue
Nov 5, 2019
#674) * Fix the xgboost_synthetic test so it actually runs and produces signal * The test wasn't actually running because we were passing arguments that were unknown to pytest * Remove the old role.yaml; we don't use it anymore * Wait for the Job to finish and properly report status; kubeflow/testing#514 contains the new routine * The test still isn't passing because of #673 * In addition we need to fix the auto deployments kubeflow/testing#444 Related to #665 * Fix lint.
jlewi
pushed a commit
to jlewi/testing
that referenced
this issue
Nov 5, 2019
* Auto deployed clusters are no longer recycling names; instead each auto deployed cluster will have a unique name * Use regexes to identify the appropriate auto deployed cluster * Only consider clusters with a minimum age; this is a hack to ensure clusters are properly setup. * Related to: kubeflow#444
k8s-ci-robot
pushed a commit
that referenced
this issue
Nov 5, 2019
* Auto deployed clusters are no longer recycling names; instead each auto deployed cluster will have a unique name * Use regexes to identify the appropriate auto deployed cluster * Only consider clusters with a minimum age; this is a hack to ensure clusters are properly setup. * Related to: #444
/kind feature |
This should be fixed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Our auto-deployments currently recycle names; e.g.
kf-vmaster-n??...
We originally did this because with IAP we used to need to create register each hostname manually. With universal redirect that should no longer be necessary.
In the past, reusing hostnames would lead to lets-encrypt quota errors. We worked around this by generating self-signed certificates.
But on master we have now switched to using GKE managed certificates and uploading our own self-signed certificate no longer works.
It would be nice if GKE managed certificates avoids the quota issues if we reuse an endpoint.
However, we should now be able to use unique hostnames for each auto-deployment. That should simplify the auto-deployment logic.
To use unique names e.g.
kf-v0-6-%YYYY%MM%DD-%HH%MM%SS
We will need to fix our cleanup logic to delete older deployments and only keep the N most recent.
We can reuse our existing cleanup_ci.py script and just modify that.
The text was updated successfully, but these errors were encountered: