Kubeflow auto-deployments from master failing; error setting project #471

jlewi · 2019-09-27T00:06:44Z

Here's the stack trace from the most recent failure

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1569499200/testing/py/kubeflow/testing/create_kf_instance.py", line 307, in <module>
    main()
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1569499200/testing/py/kubeflow/testing/create_kf_instance.py", line 259, in main
    deploy_with_kfctl_go(kfctl_path, args, app_dir, env)
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1569499200/testing/py/kubeflow/testing/create_kf_instance.py", line 103, in deploy_with_kfctl_go
    config_spec["spec"]["project"] = args.project
KeyError: 'spec'

The text was updated successfully, but these errors were encountered:

jlewi · 2019-09-27T00:08:03Z

I think the problem is that we are trying to pull the config from kubeflow/kubeflow but the manifest has moved to kubeflow/manifests.

Here's the invocation.

        - /usr/local/bin/auto_deploy.sh
        - --repos=kubeflow/kubeflow;kubeflow/testing
        - --project=kubeflow-ci-deployment
        - --job_labels=/etc/pod-info/labels
        - --data_dir=/mnt/test-data-volume/auto_deploy
        - --base_name=kf-vmaster
        - --max_num_cluster=5
        - --zone=us-east1-b
        - --github_token_file=/secret/github-token/github_token
        - --kfctl_config=https://raw.githubusercontent.com/kubeflow/kubeflow/master/bootstrap/config/kfctl_gcp_iap.yaml

* Fix kubeflow#471

* Fix #471

jlewi · 2019-10-18T15:37:03Z

Still failing

gcloud --project=kubeflow-ci-deployment container clusters list --format="table(name, location, status, createTime)" --sort-by=createTime
NAME            LOCATION       STATUS    CREATE_TIME
deployapp       us-east1-d     RUNNING   2019-04-26T22:35:53+00:00
apps            us-central1-a  RUNNING   2019-09-18T23:04:32+00:00
kf-vmaster-n04  us-east1-b     RUNNING   2019-10-02T12:13:19+00:00
kf-vmaster-n00  us-east1-b     RUNNING   2019-10-03T00:09:58+00:00
myapp2          us-central1-a  RUNNING   2019-10-03T04:18:04+00:00
kf-vmaster-n01  us-east1-b     RUNNING   2019-10-03T12:12:28+00:00
kf-vmaster-n02  us-east1-b     RUNNING   2019-10-04T00:12:04+00:00
kf-v0-6-n02     us-east1-b     RUNNING   2019-10-16T12:17:38+00:00
kf-v0-6-n03     us-east1-b     RUNNING   2019-10-17T00:37:52+00:00
kf-v0-6-n04     us-east1-b     RUNNING   2019-10-17T12:13:56+00:00
kf-v0-6-n00     us-east1-b     RUNNING   2019-10-18T00:14:03+00:00
kf-v0-6-n01     us-east1-b     RUNNING   2019-10-18T12:14:18+00:00
kfctl-7062      us-central1-a  STOPPING  2019-10-18T15:13:08+00:00

jlewi · 2019-10-18T15:38:47Z

Here's the latest error.

INFO|2019-10-18T12:07:40|/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/util.py|69| unknown flag: --config
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/create_kf_instance.py", line 307, in <module>
    main()
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/create_kf_instance.py", line 259, in main
    deploy_with_kfctl_go(kfctl_path, args, app_dir, env)
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/create_kf_instance.py", line 116, in deploy_with_kfctl_go
    env=env)
  File "/mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/testing/py/kubeflow/testing/util.py", line 85, in run
    " ".join(command), process.returncode), "\n".join(output))
subprocess.CalledProcessError: Command 'cmd: /mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/kubeflow/bootstrap/bin/kfctl init /mnt/test-data-volume/auto_deploy/auto-deploy-master-1571400000/kf-vmaster-n03 -V --config=/tmp/tmpP8oZdf.yaml exited with code 1' returned non-zero exit status 1

Related to kubeflow#471

Related to kubeflow#471 * Don't set name in the spec because we want to infer it form directory.

Related to kubeflow#471 * Don't set name in the spec because we want to infer it form directory. * Create a new script to deploy with a unique name * Related to: kubeflow#444 * Update cleanup script to clean up new auto-deployed clusters

* Auto deploy job needs to use the new kfctl syntax; also use unique names Related to #471 * Don't set name in the spec because we want to infer it form directory. * Create a new script to deploy with a unique name * Related to: #444 * Update cleanup script to clean up new auto-deployed clusters * In cron job get code from master. * Fix lint. * Revert changes to create_kf_instance * update to v1beta1 spec. * * We need to use a self-signed certificate with the auto-deployed clusters because otherwise we hit lets-encrypt rate limiting.

jtfogarty · 2020-01-08T16:13:13Z

/kind bug

jlewi · 2020-02-03T23:20:11Z

I think this is obsolete.

jlewi added priority/p1 area/testing labels Sep 27, 2019

jlewi pushed a commit to jlewi/testing that referenced this issue Sep 27, 2019

autodeploy jobs should pull kfdef specs from kubeflow/manifests

97100a5

* Fix kubeflow#471

jlewi mentioned this issue Sep 27, 2019

autodeploy jobs should pull kfdef specs from kubeflow/manifests #472

Merged

k8s-ci-robot closed this as completed in #472 Oct 18, 2019

k8s-ci-robot pushed a commit that referenced this issue Oct 18, 2019

autodeploy jobs should pull kfdef specs from kubeflow/manifests (#472)

81649a1

* Fix #471

jlewi reopened this Oct 18, 2019

jlewi pushed a commit to jlewi/testing that referenced this issue Oct 18, 2019

Auto deploy job needs to use the new kfctl syntax.

2a41eb2

Related to kubeflow#471

jlewi pushed a commit to jlewi/testing that referenced this issue Oct 18, 2019

Auto deploy job needs to use the new kfctl syntax.

b15a921

Related to kubeflow#471 * Don't set name in the spec because we want to infer it form directory.

jlewi mentioned this issue Oct 18, 2019

Auto deploy job needs to use the new kfctl syntax. #495

Merged

k8s-ci-robot added the kind/bug label Jan 8, 2020

jlewi closed this as completed Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubeflow auto-deployments from master failing; error setting project #471

Kubeflow auto-deployments from master failing; error setting project #471

jlewi commented Sep 27, 2019

jlewi commented Sep 27, 2019

jlewi commented Oct 18, 2019

jlewi commented Oct 18, 2019

jtfogarty commented Jan 8, 2020

jlewi commented Feb 3, 2020

Kubeflow auto-deployments from master failing; error setting project #471

Kubeflow auto-deployments from master failing; error setting project #471

Comments

jlewi commented Sep 27, 2019

jlewi commented Sep 27, 2019

jlewi commented Oct 18, 2019

jlewi commented Oct 18, 2019

jtfogarty commented Jan 8, 2020

jlewi commented Feb 3, 2020