-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Pipeline leveraging Arena #1058
Build Pipeline leveraging Arena #1058
Conversation
|
||
# def DistributeTFOp(name, image, gpus: int, ): | ||
|
||
class DistributeTFOp(dsl.ContainerOp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def arena_distribute_tf_op(....):
return MPIOp(....
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why some samples use RandomNumOp directly? I'm wondering the principles. Thanks for your advises? https://github.com/kubeflow/pipelines/blob/master/samples/basic/condition.py#L20
with dsl.Condition(flip.output == 'tails'):
random_num_tail = RandomNumOp(10, 19)
with dsl.Condition(random_num_tail.output > 15):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusing. Those samples are the oldest and have not been updated for a while.
Other samples are a bit more modern: https://github.com/kubeflow/pipelines/blob/master/samples/kubeflow-tf/kubeflow-training-classification.py
Your code (ineriting from ContainerOp
vs returning ContainerOp
) is not wrong. It's just outdated a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I will update with your suggestions.
|
||
# def arena_submit_standalone_job_op(name, image, gpus: int, ): | ||
|
||
class MPIOp(dsl.ContainerOp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def arena_launch_mpi_op(....):
return ContainerOp(....
)
# def arena_submit_standalone_job_op(name, image, gpus: int, ): | ||
|
||
class StandaloneOp(dsl.ContainerOp): | ||
"""Submit standalone Job.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more comprehensive docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will update.
@Ark-kun Please take a look again. |
except: | ||
experiment_id = client.create_experiment(EXPERIMENT_NAME).id | ||
run = client.run_pipeline(experiment_id, RUN_ID, __file__ + '.tar.gz', | ||
params={'learning_rate':learning_rate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, there is currently a bug/inconsistency that forces you to use -
in the arguments here instead of _
. Have you tried to run your sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I've tested it with pipeline in kubeflow 0.4.0, not with the latest pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already tested with kfp 0.1.4. Looks good.
name='pipeline to run jobs', | ||
description='shows how to run pipeline jobs.' | ||
) | ||
def sample_pipeline(learning_rate=dsl.PipelineParam(name='learning_rate', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use learning_rate='0.01'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
/retest |
@Ark-kun , please take a look again when you have time. Thanks. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
afa1800
to
446b20f
Compare
fix the length of name fix the length of name change the default image
446b20f
to
b83336e
Compare
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
/ping @Ark-kun |
/lgtm |
Sorry for a delayed response. I wanted to make sure that this sample follows our directory organization for contributed samples. I'm a bit concerned about the Arena sample being the first one the user sees when they enter the samples directory. I feel that the sample is quiet specialized and might not be the first sample that the user should see. We can probably fix that later so I do not see a reason for holding this PR any longer. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The sample pipeline runs preparing data, source code, training and exporting a Tensorflow model with MNIST handwriting recognition using Arena. It provides arena_launcher and API to make the user easy to use pipelines to train the specific training, such as MPI Job, TensorFlow Estimator Job.
This change is