Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve notebook check automation #2040

Merged

Conversation

numerology
Copy link

@numerology numerology commented Sep 4, 2019

Part of #1750

  1. Determine whether a sample is a notebook or python file based on the file extension instead of test name.
  2. Move notebook params injection into config yaml files
  3. Rename two samples to comply with naming conventions.
  4. Add whether to run a pipeline as a flag in config yaml, instead of guessing based on whether we have an experiment name specified.

This change is Reviewable

@numerology
Copy link
Author

/test kubeflow-pipeline-sample-test

@numerology
Copy link
Author

/test all

@gaoning777
Copy link
Contributor

Since the sample names change, could you make sure that the links to these samples are updated in this repo as well as the kubeflow/website?

@numerology
Copy link
Author

Since the sample names change, could you make sure that the links to these samples are updated in this repo as well as the kubeflow/website?

Sure. Will do that shortly.

else:
subprocess.call(['dsl-compile', '--py', '%s.py' % self._test_name,
'--output', '%s.yaml' % self._test_name])


def run_test(self):
self._compile_sample()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to rename the functions since compile_sample for notebook is the execution. And then the check_result function is to execute the sample. Should we separate the steps as follows:
compile(): compile dsl to pipeline, conver ipynb to py
execute(): submit pipeline or run the notebook py
check()
WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That requires further refactoring in check_notebook_results.py and run_sample_test.py, in order to separate execute and check. Will let you know when I'm done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, There is now a nice function that compiles the pipeline, gets experiment and submits the run. All in a single line:
kfp.run_pipeline_func_on_cluster(automl_pipeline, arguments={})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factorized into following 3 steps:
compile(): py to pipeline or notebook to py (after papermill preparation)
execute(): py retrieve config and submit for pipeline run if needed. run notebook (which usually contains pipline run)
check().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, There is now a nice function that compiles the pipeline, gets experiment and submits the run. All in a single line:
kfp.run_pipeline_func_on_cluster(automl_pipeline, arguments={})

Thanks for the info! Will do that in a following PR.

# limitations under the License.

test_name: lightweight_component
notebook_params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have the same way of configuring arguments for both notebooks and pipelines.
BTW, we can also pass data to notebooks by using the environment variables (EXPERIMENT_NAME = os.environ['EXPERIMENT_NAME']). This way the user can set the environment variable once and then run samples without modifications.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the current implementation is that, these two sets of parameters are consumed by different things (one is papermill and another is kfp pipeline), and actually it's possible to have duplicate names across this two sets. Also, there functions are a bit of different.

Just want to make sure I under stand your second point correctly. Does it mean that we will encourage users to assign values in their notebook in this way? This config file will be used by the sample test infra only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two sets of parameters are consumed by different things (one is papermill and another is kfp pipeline)

Does papermill consume them itself or does it pass them to the notebook?

Just want to make sure I under stand your second point correctly.

The second point was about different ways the user or tests can set the required variables in a notebook. At this moment the user must manually insert variable values in a notebook and tests use papermill arguments. Maybe in future we can:

  1. Reduce the number of variables that the user needs to specify to run the sample notebook (ideally to 0)
  2. Maybe we can use environment variables for the rest. Papermill way of passing variables requires the sample notebooks to have specific structure which is not always the most readable (for example, previously all the component images had to be taken out of their components and specified in the first cell). This scheme would also be compatible with notebooks converted to python files.

I'm not asking you to implementing this. This is more like design thoughts that you might find relevant.

Copy link
Author

@numerology numerology Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does papermill consume them itself or does it pass them to the notebook?

papermill passes them to the notebook to substitute some constants defined in the notebook.

Reduce the number of variables that the user needs to specify to run the sample notebook (ideally to 0)

Agree this would be ideal.

Maybe we can use environment variables for the rest.

Does this require that something needs to be assigned using os.environ inside the notebook?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require that something needs to be assigned using os.environ inside the notebook?

Yes. But this is something for another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit way of putting the argument at the top is in fact a good practice. Depending on the environment variable for input is implicit and might lead to negligence.

@numerology
Copy link
Author

/test kubeflow-pipeline-e2e-test

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 6, 2019

/lgtm

@numerology
Copy link
Author

/approve

@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 9, 2019
@gaoning777
Copy link
Contributor

/lgtm

@gaoning777
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaoning777, numerology

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 7b1c720 into kubeflow:master Sep 9, 2019
@numerology numerology deleted the improve-notebook-check-automation branch September 9, 2019 20:42
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
)

* Fix canary rollout falling back to prev rolledout version

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Fix lightgbm model path

Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants