New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Update xgboost_synthetic test infra; preliminary updates to work with 0.7.0 #666

Merged

k8s-ci-robot merged 3 commits into kubeflow:master from jlewi:fix_xgboost

Oct 25, 2019

Contributor

jlewi commented Oct 20, 2019 •

edited

Loading

Update xgboost_synthetic test infra to use pytest and pyfunc.

Related to Update xgboost_synthetic to 0.7 #655 update xgboost_synthetic for 0.7
Related to to No signal about xgboost_synthetic test in periodic dashboard and its failing #665 no signal about xgboost_synthetic
We need to update the xgboost_synthetic example to work with 0.7.0;
e.g. workload identity
This PR focuses on updating the test infra and some preliminary
updates the notebook
More fixes to the test and the notebook are probably needed in order
to get it to actually pass
Update job spec for 0.7; remove the secret and set the default service
account.
- This is to make it work with workload identity
Instead of using kustomize to define the job to run the notebook we can just modify the YAML spec using python.
Use the python API for K8s to create the job rather than shelling out.
Notebook should do a 0.7 compatible check for credentials
- We don't want to assume GOOGLE_APPLICATION_CREDENTIALS is set
  because we will be using workload identity.
Take in repos as an argument akin to what checkout_repos.sh requires
Convert xgboost_test.py to a pytest.
- This allows us to mark it as expected to fail so we can start to get
  signal without blocking
- We also need to emit junit files to show up in test grid.
Convert the jsonnet workflow for the E2E test to a python function to
define the workflow.
- Remove the old jsonnet workflow.

This change is

k8s-ci-robot added the do-not-merge/work-in-progress label

review-notebook-app bot commented Oct 20, 2019

Check out this pull request on

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

k8s-ci-robot requested review from lluunn and texasmichelle

October 20, 2019 00:41

k8s-ci-robot added size/L size/XXL and removed size/L labels


          Update xgboost_synthetic test infra to use pytest and pyfunc.

386b97b

* Related to kubeflow#655 update xgboost_synthetic to use workload identity

* Related to to kubeflow#665 no signal about xgboost_synthetic

* We need to update the xgboost_synthetic example to work with 0.7.0;
  e.g. workload identity

* This PR focuses on updating the test infra and some preliminary
  updates the notebook

* More fixes to the test and the notebook are probably needed in order
  to get it to actually pass

* Update job spec for 0.7; remove the secret and set the default service
  account.

  * This is to make it work with workload identity

* Instead of using kustomize to define the job to run the notebook we can just modify the YAML spec using python.
* Use the python API for K8s to create the job rather than shelling out.

* Notebook should do a 0.7 compatible check for credentials

  * We don't want to assume GOOGLE_APPLICATION_CREDENTIALS is set
    because we will be using workload identity.

* Take in repos as an argument akin to what checkout_repos.sh requires

* Convert xgboost_test.py to a pytest.

  * This allows us to mark it as expected to fail so we can start to get
    signal without blocking

  * We also need to emit junit files to show up in test grid.

* Convert the jsonnet workflow for the E2E test to a python function to
  define the workflow.

  * Remove the old jsonnet workflow.

jlewi force-pushed the fix_xgboost branch from 3792c40 to 386b97b Compare

October 24, 2019 04:26

jlewi changed the title ~~Update xgboost_synthetic test to work with 0.7~~ Update xgboost_synthetic test infra; preliminary updates to work with 0.7.0

jlewi marked this pull request as ready for review

October 24, 2019 04:32

k8s-ci-robot removed the do-not-merge/work-in-progress label

Contributor Author

jlewi commented Oct 24, 2019

/assign @jinchihe
/assign @kunmingg

k8s-ci-robot assigned jinchihe and kunmingg

jinchihe reviewed

View reviewed changes

Member

jinchihe left a comment

@jlewi Great starting to update test to use py function.

xgboost_synthetic/testing/xgboost_test.py Outdated

+                with open("job.yaml") as hf:
+                  job = yaml.load(hf)
+                job["metadata"]["namespace"] = name

Member

jinchihe Oct 24, 2019

Removed this?

Contributor Author

jlewi Oct 24, 2019

Good catch; should be setting name.

xgboost_synthetic/testing/xgboost_test.py

+              # TODO(jlewi): This test is currently failing because various things
+              # need to be updated to work with 0.7.0. Until that's fixed we mark it
+              # as expected to fail so we can begin to get signal.
+              @pytest.mark.xfail

Member

jinchihe Oct 24, 2019

It's better to have a ticket to trace this before merging.

xgboost_synthetic/testing/xgboost_test.py Outdated

+                end_time = datetime.datetime.now() + datetime.timedelta(
+                  minutes=15)
+                namespace = job["metadata"]["namespace"]

Member

jinchihe Oct 24, 2019

why not use namespace var directly? the job["metadata"]["namespace"] from namespace and no if..else.. for this.

Contributor Author

jlewi Oct 24, 2019

What's the if you are referring to?
Do you mean setting of name?

Member

jinchihe Oct 25, 2019

the job["metadata"]["namespace"] = name above, so no need to namespace = job["metadata"]["namespace"], we can use the namespace var directly.
Yes I noticed you have updated that. Great!

xgboost_synthetic/testing/xgboost_test.py

+                    continue
+                  # ready_replicas could be None
+                  if not job.conditions:
+                    logging.info("Job missing condition")

Member

jinchihe Oct 24, 2019

Should continue to next loop here if the job.conditions is not out (in very shart time the job conditions may be generated), otherwise will traceback by next last_condition = job.conditions[-1]

Contributor Author

jlewi Oct 24, 2019

Good catch

xgboost_synthetic/testing/xgboost_test.py Outdated

+                if not last_condition or last_condition["type"] not in ["Failed", "Complete"]:
+                  logging.error("Timeout waiting for job %s.%s to finish.", namespace, name)
+                  assert last_condition["type"] in ["Failed", "Complete"]

Member

jinchihe Oct 24, 2019

Should raise RuntimeError if last_condition["type"] in ["Failed"]?

Contributor Author

jlewi Oct 24, 2019

Done

xgboost_synthetic/testing/rolebinding.yaml

		@@ -1,14 +0,0 @@
		apiVersion: rbac.authorization.k8s.io/v1

Member

jinchihe Oct 24, 2019

Can create Job if not define Role? I remember need this otherwise we cannot submit that, just reminder here.

Contributor Author

jlewi Oct 24, 2019

Namespaces should be provisioned with the service account default-editor which should have sufficient priveleges. We shouldn't need to do any additional role creation. If it is then its a bug in Kubeflow somewhere.


          Address comments.

b40cc07

jlewi mentioned this pull request

WIP: Resolve two problems in ci/cd testing. #668

Closed

Member

jinchihe commented Oct 25, 2019

Seems there are pylint error in examples/xgboost_synthetic/util.py , others LGTM

INFO|2019-10-25T00:03:20|util.py:71| ************* Module util
INFO|2019-10-25T00:03:20|util.py:71| C: 15, 0: Line too long (102/100) (line-too-long)
INFO|2019-10-25T00:03:20|util.py:71| C: 37, 0: Trailing whitespace (trailing-whitespace)
INFO|2019-10-25T00:03:20|util.py:71| C: 45, 0: Trailing whitespace (trailing-whitespace)
INFO|2019-10-25T00:03:20|util.py:71| C:  9, 0: standard import "from pathlib import Path" should be placed before "import requests" (wrong-import-order)


          Fix issues with the notebook

5a26b0f

* Install pip packages in user space
  * 0.7.0 images are based on TF images and they have different permissions
* Install a newer version of fairing sdk that works with workload identity

* Split pip installing dependencies out of util.py and into notebook_setup.py

  * That's because util.py could depend on the packages being installed by
    notebook_setup.py

* After pip installing the modules into user space; we need to add the local
  path for pip packages to the python otherwise we get import not found
  errors.

jlewi force-pushed the fix_xgboost branch from a3704f2 to 5a26b0f Compare

October 25, 2019 02:10

Member

jinchihe commented Oct 25, 2019

/lgtm

k8s-ci-robot added the lgtm label

Contributor Author

jlewi commented Oct 25, 2019

/approve

Contributor

k8s-ci-robot commented Oct 25, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jlewi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the approved label

k8s-ci-robot merged commit 7e28cd6 into kubeflow:master

texasmichelle mentioned this pull request

Migrate tests off of ksonnet #657

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm size/XXL