-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Samples - Added the data passing tutorial #2258
Samples - Added the data passing tutorial #2258
Conversation
/retest |
This is a nice start. I think this example would benefit from more explanation. For example, when you introduce the big data processing method, it would make sense to say something like "The way Kubeflow Pipelines passes complex objects between components is by writing them to a file in the producer component and reading from the file consumer component. This means that the file's full contents don't ever have to be stored in memory or passed between components, only a pointer to the file need be passed." Secondly, the semantics of OutputFile(str) vs OutputFile(int) could use an explanation. Does this mean that the OutputFile is referred to by a str/int? Does it mean that the contents of the file will be a str/int? Is OutputFile(str) long-hand for OutputTextFile? Why are InputFile and OutputFile separate classes rather than just a unified FileReference class? Incidentally, it might make sense to rename InputFile to InputFileReference or some such, since these objects don't fulfill Python's File-Like contract. Also, the Binary imports aren't used. Good work. |
Thank you for the review.
This makes sense. Thanks for the write-up - I'll include it in the tutorial.
It's the latter. Normally, when you add parameter annotations, you want to specify the type of the data that your function or component wants to receive (e.g.
There is no
There are not
These "annotation" classes are only used for annotating the function signature. The parameters annotated with
Probably they're too much for this tutorial (although they behave the same as the *TextFile annotations). I could demonstrate using them to save/load some data-frame objects, but that might be too much. Do you think that |
It's going to be a different object type at runtime than the call signature declares? That's very confusing for the user. Will autocomplete work? What about Python load methods which require a file path rather than an open file object? |
I can make another PR to make them inherit from the proper python classes.
That's what the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from the question on create_run_from_pipeline_func () it /lgtm
p.s. Good job! this looks like a very nice way of passing data between components.
" print(text)\n", | ||
"\n", | ||
"def constant_to_consumer_pipeline():\n", | ||
" '''Pipeline that passes small constant string to to consumer'''\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: One-line docstring summary should end with "." , same with others in the doc.
Judging by customer feedback, it's important for us to add this tutorial. Another recent comment here on GitHub from Nubank company which wants to use KFP: "Just took a look at your notebook, I wasn't aware of InputPath and OutputTextFile, they will be very handy." |
I've added big explanation sections to the tutorial. |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Why do we create the new directory? And, we probably need to add tests for it. |
I'll add it to the tests. There are two small issues preventing this, but I'll do that as soon as they're fixed. |
And, why not add it to the core/ directory? |
@Ark-kun, I can wholeheartedly agree with this. I found the example above to be very useful, so thanks in the first place for putting it out there! It is however quite hard to figure out how to adapt it correctly, and it is entirely unclear what is happening in the background. The example could use a lot more explanation of what the underlying classes do. Alternatively, the classes themselves in the SDK could use some more detailed explanation. The |
The
Can you please create open an issue? I've very responsive to them.
I'd like top hear more about your problems.
Which concrete classes do you mean? Did you mean InputPath and company? These classes are just markers/annotations. They convey additional information about particular function parameter. I would really like to understand which aspects to explain better. P.S. As python components are just built on top of generic container components, it's useful to see the generated Use |
* Fix incorrect function definition, issue kubeflow#2258 Signed-off-by: Eyal Cohen <eyalcohen@eyals-mbp.haifa.il.ibm.com> Signed-off-by: Eyal Cohen <eyalcohen@sig-9-145-160-25.de.ibm.com> * Fix lint - unused imported Signed-off-by: Eyal Cohen <eyalcohen@sig-9-145-160-25.de.ibm.com> Co-authored-by: Eyal Cohen <eyalcohen@eyals-mbp.haifa.il.ibm.com>
This change is