-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose step id and step name #1191
Conversation
/assign @Ark-kun |
Can you please explain the purpose of this change? Why do you need to pass step ID and name? |
Yes, the requirement is from our customer. They want to persistent the logs and some outputs to distributed storage so they need the unique name of the step. And I think we can change if the orchestrators are changed. What's your suggestion on that? If the user want to get the unique id and name of the step. |
Thanks for response. In fact, the purpose is to persistent or organize the logs and some outputs(for example, the result of feature extraction) in specific format in the distributed storage so it can be easily used to trace the process And for on-prem user and the user in China, S3 is impossible to use. |
@cheyang Is it possible to move these to the pipeline level? The higher it is - the easier it would be to change later.
Our current way of passing data (until artifact passing support) is to give some step a URI template which gets resolved at run time and the step outputs that URI to pass it forward. So a train step outputs the model URI and not step name or ID. This is also easier for the consuming step since you pass it a URI instead of step name and ID that it needs to combine to build that URI. |
@Ark-kun our users will build a lot of pipelines, so it is a duplicate work to set I think the benefits of doing this in abstraction layer(arena op) are only changes in one place if we are not using Argo any more. The end user should be not aware of this change, they just update the SDK. WDYT? |
@Ark-kun, do you have additional comments on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vicaire @Ark-kun @cheyang am still struggling to see the overall value here. We can archive logs from a pipeline - Argo supports that
https://github.com/argoproj/argo/blob/master/docs/workflow-controller-configmap.yaml#L40-L76
Now wouldnt it be easier to look in Argo and see if they support a backend other than S3?
@Ark-kun, any further comments on this one? |
Are you sure your code is doing what you want? I do not see it to be using step/pod ID. It only uses |
I see you've fixed the pod.name issue /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@animeshsingh, I saw your comment only after entering mine. I agree on the idea of using Argo to backup the logs to an object store. (exposing step ID and step name might be useful regardless to, for instance, call the future ML Metadata service and store metadata about an artifact, including which pipeline and which pipeline step computed it). |
Thank you! |
Need expose step id and step name when running pipelines.
This change is