-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve notebook check automation #2040
Improve notebook check automation #2040
Conversation
/test kubeflow-pipeline-sample-test |
/test all |
Since the sample names change, could you make sure that the links to these samples are updated in this repo as well as the kubeflow/website? |
Sure. Will do that shortly. |
else: | ||
subprocess.call(['dsl-compile', '--py', '%s.py' % self._test_name, | ||
'--output', '%s.yaml' % self._test_name]) | ||
|
||
|
||
def run_test(self): | ||
self._compile_sample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to rename the functions since compile_sample for notebook is the execution. And then the check_result function is to execute the sample. Should we separate the steps as follows:
compile(): compile dsl to pipeline, conver ipynb to py
execute(): submit pipeline or run the notebook py
check()
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires further refactoring in check_notebook_results.py and run_sample_test.py, in order to separate execute and check. Will let you know when I'm done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, There is now a nice function that compiles the pipeline, gets experiment and submits the run. All in a single line:
kfp.run_pipeline_func_on_cluster(automl_pipeline, arguments={})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factorized into following 3 steps:
compile(): py to pipeline or notebook to py (after papermill preparation)
execute(): py retrieve config and submit for pipeline run if needed. run notebook (which usually contains pipline run)
check().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, There is now a nice function that compiles the pipeline, gets experiment and submits the run. All in a single line:
kfp.run_pipeline_func_on_cluster(automl_pipeline, arguments={})
Thanks for the info! Will do that in a following PR.
# limitations under the License. | ||
|
||
test_name: lightweight_component | ||
notebook_params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have the same way of configuring arguments for both notebooks and pipelines.
BTW, we can also pass data to notebooks by using the environment variables (EXPERIMENT_NAME = os.environ['EXPERIMENT_NAME']
). This way the user can set the environment variable once and then run samples without modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for the current implementation is that, these two sets of parameters are consumed by different things (one is papermill and another is kfp pipeline), and actually it's possible to have duplicate names across this two sets. Also, there functions are a bit of different.
Just want to make sure I under stand your second point correctly. Does it mean that we will encourage users to assign values in their notebook in this way? This config file will be used by the sample test infra only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two sets of parameters are consumed by different things (one is papermill and another is kfp pipeline)
Does papermill consume them itself or does it pass them to the notebook?
Just want to make sure I under stand your second point correctly.
The second point was about different ways the user or tests can set the required variables in a notebook. At this moment the user must manually insert variable values in a notebook and tests use papermill arguments. Maybe in future we can:
- Reduce the number of variables that the user needs to specify to run the sample notebook (ideally to 0)
- Maybe we can use environment variables for the rest. Papermill way of passing variables requires the sample notebooks to have specific structure which is not always the most readable (for example, previously all the component images had to be taken out of their components and specified in the first cell). This scheme would also be compatible with notebooks converted to python files.
I'm not asking you to implementing this. This is more like design thoughts that you might find relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does papermill consume them itself or does it pass them to the notebook?
papermill passes them to the notebook to substitute some constants defined in the notebook.
Reduce the number of variables that the user needs to specify to run the sample notebook (ideally to 0)
Agree this would be ideal.
Maybe we can use environment variables for the rest.
Does this require that something needs to be assigned using os.environ inside the notebook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require that something needs to be assigned using os.environ inside the notebook?
Yes. But this is something for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the explicit way of putting the argument at the top is in fact a good practice. Depending on the environment variable for input is implicit and might lead to negligence.
/test kubeflow-pipeline-e2e-test |
…mprove-notebook-check-automation
/lgtm |
/approve |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gaoning777, numerology The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Part of #1750
This change is