Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeflow auto-deployments from master failing; error setting project #471

Closed
jlewi opened this issue Sep 27, 2019 · 5 comments · Fixed by #472
Closed

Kubeflow auto-deployments from master failing; error setting project #471

jlewi opened this issue Sep 27, 2019 · 5 comments · Fixed by #472

Comments

@jlewi
Copy link
Contributor

jlewi commented Sep 27, 2019

Here's the stack trace from the most recent failure

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1569499200/testing/py/kubeflow/testing/create_kf_instance.py", line 307, in <module>
    main()
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1569499200/testing/py/kubeflow/testing/create_kf_instance.py", line 259, in main
    deploy_with_kfctl_go(kfctl_path, args, app_dir, env)
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1569499200/testing/py/kubeflow/testing/create_kf_instance.py", line 103, in deploy_with_kfctl_go
    config_spec["spec"]["project"] = args.project
KeyError: 'spec'

@jlewi
Copy link
Contributor Author

jlewi commented Sep 27, 2019

I think the problem is that we are trying to pull the config from kubeflow/kubeflow but the manifest has moved to kubeflow/manifests.

Here's the invocation.

        - /usr/local/bin/auto_deploy.sh
        - --repos=kubeflow/kubeflow;kubeflow/testing
        - --project=kubeflow-ci-deployment
        - --job_labels=/etc/pod-info/labels
        - --data_dir=/mnt/test-data-volume/auto_deploy
        - --base_name=kf-vmaster
        - --max_num_cluster=5
        - --zone=us-east1-b
        - --github_token_file=/secret/github-token/github_token
        - --kfctl_config=https://raw.githubusercontent.com/kubeflow/kubeflow/master/bootstrap/config/kfctl_gcp_iap.yaml

@jlewi
Copy link
Contributor Author

jlewi commented Oct 18, 2019

Still failing

gcloud --project=kubeflow-ci-deployment container clusters list --format="table(name, location, status, createTime)" --sort-by=createTime
NAME            LOCATION       STATUS    CREATE_TIME
deployapp       us-east1-d     RUNNING   2019-04-26T22:35:53+00:00
apps            us-central1-a  RUNNING   2019-09-18T23:04:32+00:00
kf-vmaster-n04  us-east1-b     RUNNING   2019-10-02T12:13:19+00:00
kf-vmaster-n00  us-east1-b     RUNNING   2019-10-03T00:09:58+00:00
myapp2          us-central1-a  RUNNING   2019-10-03T04:18:04+00:00
kf-vmaster-n01  us-east1-b     RUNNING   2019-10-03T12:12:28+00:00
kf-vmaster-n02  us-east1-b     RUNNING   2019-10-04T00:12:04+00:00
kf-v0-6-n02     us-east1-b     RUNNING   2019-10-16T12:17:38+00:00
kf-v0-6-n03     us-east1-b     RUNNING   2019-10-17T00:37:52+00:00
kf-v0-6-n04     us-east1-b     RUNNING   2019-10-17T12:13:56+00:00
kf-v0-6-n00     us-east1-b     RUNNING   2019-10-18T00:14:03+00:00
kf-v0-6-n01     us-east1-b     RUNNING   2019-10-18T12:14:18+00:00
kfctl-7062      us-central1-a  STOPPING  2019-10-18T15:13:08+00:00

@jlewi
Copy link
Contributor Author

jlewi commented Oct 18, 2019

Here's the latest error.

INFO|2019-10-18T12:07:40|/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/util.py|69| unknown flag: --config
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/create_kf_instance.py", line 307, in <module>
    main()
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/create_kf_instance.py", line 259, in main
    deploy_with_kfctl_go(kfctl_path, args, app_dir, env)
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/create_kf_instance.py", line 116, in deploy_with_kfctl_go
    env=env)
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/util.py", line 85, in run
    " ".join(command), process.returncode), "\n".join(output))
subprocess.CalledProcessError: Command 'cmd: /mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/kubeflow/bootstrap/bin/kfctl init /mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/kf-vmaster-n03 -V --config=/tmp/tmpP8oZdf.yaml exited with code 1' returned non-zero exit status 1

jlewi pushed a commit to jlewi/testing that referenced this issue Oct 18, 2019
jlewi pushed a commit to jlewi/testing that referenced this issue Oct 18, 2019
Related to kubeflow#471

* Don't set name in the spec because we want to infer it form directory.
jlewi pushed a commit to jlewi/testing that referenced this issue Oct 18, 2019
Related to kubeflow#471

* Don't set name in the spec because we want to infer it form directory.

* Create a new script to deploy with a unique name

* Related to: kubeflow#444

* Update cleanup script to clean up new auto-deployed clusters
jlewi pushed a commit to jlewi/testing that referenced this issue Oct 23, 2019
Related to kubeflow#471

* Don't set name in the spec because we want to infer it form directory.

* Create a new script to deploy with a unique name

* Related to: kubeflow#444

* Update cleanup script to clean up new auto-deployed clusters
k8s-ci-robot pushed a commit that referenced this issue Oct 23, 2019
* Auto deploy job needs to use the new kfctl syntax; also use unique names

Related to #471

* Don't set name in the spec because we want to infer it form directory.

* Create a new script to deploy with a unique name

* Related to: #444

* Update cleanup script to clean up new auto-deployed clusters

* In cron job get code from master.

* Fix lint.

* Revert changes to create_kf_instance

* update to v1beta1 spec.

* * We need to use a self-signed certificate with the auto-deployed clusters
  because otherwise we hit lets-encrypt rate limiting.
@jtfogarty
Copy link

/kind bug

@jlewi
Copy link
Contributor Author

jlewi commented Feb 3, 2020

I think this is obsolete.

@jlewi jlewi closed this as completed Feb 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants