Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Bring back azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) #19003

Closed
sergey-ivanchuk opened this issue May 28, 2021 · 7 comments
Assignees
Labels
ADO Issue is documented on MSFT ADO for internal tracking customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. Machine Learning ML-Pipelines AreaPath needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@sergey-ivanchuk
Copy link

Cross post from #18182 (comment)

Is your feature request related to a problem? Please describe.

For future releases, I'd like to see the return of an old, deprecated feature in the Azure Python SDK.

It would be great to use azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) . This parameter was depreciated a long time ago, but I feel it would benefit the Azure SDK user community.

Below is a use case I have, and a use case that's fairly practical for certain situations.

.
├── pipeline
│   ├── aml_process.py  # GOAL 2 -use  PythonScriptStep (allow_reuse=True , source_directory='./../',  script_name='./pipeline/step_1/math_check.py', hash_paths = './pipeline/step_1' …,… )
│   ├── step_1
│   │   └── math_check.py     # GOAL 1A - import from src/math.py & src/helper.py at runtime
│   └── step_2 
│       └── calculation.py.   # GOAL 1B - import from src/helper.py at runtime
├── requirements.txt
└── src
    ├── helper.py
    └── math.py

From my two goals above, I have them within a repository with source and pipeline code to run.

For goal 1 , I want to importsrc code. So, I need to make source_directory='./../' in the PythonScriptStep function

For goal 2, I want to use allow_reuse=True and hash_paths = './pipeline/step_1' so that I can do hashing on multiple sub-steps in a pipeline (e.g. use case where I need to re-run step_2 but still re-use step_1).

In reality, I might have 6 sub-steps in a repository. So, the value of hash_paths goes up greatly. Only re-running 1-of-6 steps is much better than re-running 6-of-6


Describe the solution you'd like

Un-depreciate azureml.pipeline.steps.python_script_step. PythonScriptStep(hash_paths= ...)


Describe alternatives you've considered

From my code snippet, I have considered splitting all code into two repositories (src and pipelines). This will meet my goal # 1 and goal # 2 from above. However, this will require more workarounds than I'd like to be responsible for. So, the code management side will be more than necessary .

azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) will give greater control and leverage for re-using certain pipeline steps.

Additional context
Nothing more to add.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels May 28, 2021
@xiangyan99 xiangyan99 added feature-request This issue requires a new behavior in the product in order be resolved. Machine Learning Service Attention Workflow: This issue is responsible by Azure service team. labels May 29, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label May 29, 2021
@ghost
Copy link

ghost commented May 29, 2021

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.

Issue Details

Cross post from #18182 (comment)

Is your feature request related to a problem? Please describe.

For future releases, I'd like to see the return of an old, deprecated feature in the Azure Python SDK.

It would be great to use azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) . This parameter was depreciated a long time ago, but I feel it would benefit the Azure SDK user community.

Below is a use case I have, and a use case that's fairly practical for certain situations.

.
├── pipeline
│   ├── aml_process.py  # GOAL 2 -use  PythonScriptStep (allow_reuse=True , source_directory='./../',  script_name='./pipeline/step_1/math_check.py', hash_paths = './pipeline/step_1' …,… )
│   ├── step_1
│   │   └── math_check.py     # GOAL 1A - import from src/math.py & src/helper.py at runtime
│   └── step_2 
│       └── calculation.py.   # GOAL 1B - import from src/helper.py at runtime
├── requirements.txt
└── src
    ├── helper.py
    └── math.py

From my two goals above, I have them within a repository with source and pipeline code to run.

For goal 1 , I want to importsrc code. So, I need to make source_directory='./../' in the PythonScriptStep function

For goal 2, I want to use allow_reuse=True and hash_paths = './pipeline/step_1' so that I can do hashing on multiple sub-steps in a pipeline (e.g. use case where I need to re-run step_2 but still re-use step_1).

In reality, I might have 6 sub-steps in a repository. So, the value of hash_paths goes up greatly. Only re-running 1-of-6 steps is much better than re-running 6-of-6


Describe the solution you'd like

Un-depreciate azureml.pipeline.steps.python_script_step. PythonScriptStep(hash_paths= ...)


Describe alternatives you've considered

From my code snippet, I have considered splitting all code into two repositories (src and pipelines). This will meet my goal # 1 and goal # 2 from above. However, this will require more workarounds than I'd like to be responsible for. So, the code management side will be more than necessary .

azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) will give greater control and leverage for re-using certain pipeline steps.

Additional context
Nothing more to add.

Author: sergey-ivanchuk
Assignees: -
Labels:

Machine Learning, Service Attention, customer-reported, feature-request, needs-triage, question

Milestone: -

@xiangyan99 xiangyan99 removed the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label May 29, 2021
@xiangyan99
Copy link
Member

Thanks for the feedback, we’ll investigate asap.

@v-strudm-msft v-strudm-msft added ADO Issue is documented on MSFT ADO for internal tracking ML-Pipelines AreaPath labels May 29, 2021
@ghost
Copy link

ghost commented May 29, 2021

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shbijlan.

Issue Details

Cross post from #18182 (comment)

Is your feature request related to a problem? Please describe.

For future releases, I'd like to see the return of an old, deprecated feature in the Azure Python SDK.

It would be great to use azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) . This parameter was depreciated a long time ago, but I feel it would benefit the Azure SDK user community.

Below is a use case I have, and a use case that's fairly practical for certain situations.

.
├── pipeline
│   ├── aml_process.py  # GOAL 2 -use  PythonScriptStep (allow_reuse=True , source_directory='./../',  script_name='./pipeline/step_1/math_check.py', hash_paths = './pipeline/step_1' …,… )
│   ├── step_1
│   │   └── math_check.py     # GOAL 1A - import from src/math.py & src/helper.py at runtime
│   └── step_2 
│       └── calculation.py.   # GOAL 1B - import from src/helper.py at runtime
├── requirements.txt
└── src
    ├── helper.py
    └── math.py

From my two goals above, I have them within a repository with source and pipeline code to run.

For goal 1 , I want to importsrc code. So, I need to make source_directory='./../' in the PythonScriptStep function

For goal 2, I want to use allow_reuse=True and hash_paths = './pipeline/step_1' so that I can do hashing on multiple sub-steps in a pipeline (e.g. use case where I need to re-run step_2 but still re-use step_1).

In reality, I might have 6 sub-steps in a repository. So, the value of hash_paths goes up greatly. Only re-running 1-of-6 steps is much better than re-running 6-of-6


Describe the solution you'd like

Un-depreciate azureml.pipeline.steps.python_script_step. PythonScriptStep(hash_paths= ...)


Describe alternatives you've considered

From my code snippet, I have considered splitting all code into two repositories (src and pipelines). This will meet my goal # 1 and goal # 2 from above. However, this will require more workarounds than I'd like to be responsible for. So, the code management side will be more than necessary .

azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) will give greater control and leverage for re-using certain pipeline steps.

Additional context
Nothing more to add.

Author: sergey-ivanchuk
Assignees: -
Labels:

ADO, ML-Pipelines, Machine Learning, Service Attention, customer-reported, feature-request

Milestone: -

@navba-MSFT
Copy link
Contributor

@sergey-ivanchuk Apologies for the late reply. We are looking into this issue and we will provide an update once we have more details on this.

@bandsina @shbijlan @likebupt Could you please look into this and provide an update once you get a chance ? Awaiting your reply.

@navba-MSFT navba-MSFT added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Mar 25, 2022
@cloga
Copy link

cloga commented Mar 28, 2022

@sergey-ivanchuk Thanks for your feedback. This is a valid scenario. As we are developing new SDK version, I will add this request to the backlog. For this old SDK version, we will not do a new investment on it.

From my understanding, you will use a single big repo to manage the pipeline, and steps in it. And when you built pipeline and steps you will use root folder for this repo. By default, we will use the whole folder to calculate the code hash to decide re-use. In this scenario, step2 changes will impact the step1 re-use verse wise.

Provide capability to let customer provide the folders want to use for calculate code hash, will also introduce some issues, for example, in your case, only provide step_1 for hash will not be sufficient, as step_1 will also depends on src. So we will think this is advance use scenario we need to support.

@sergey-ivanchuk
Copy link
Author

hi everyone, thanks for your recent follow-ups.

@cloga , follow-up comments below:

From my understanding, you will use a single big repo to manage the pipeline, and steps in it. And when you built pipeline and steps you will use root folder for this repo. By default, we will use the whole folder to calculate the code hash to decide re-use. In this scenario, step2 changes will impact the step1 re-use verse wise.

Yes, exactly.

Hypothetically, I could have a 5-step process and only want to re-run steps 5 (model training)

Provide capability to let customer provide the folders want to use for calculate code hash, will also introduce some issues, for example, in your case, only provide step_1 for hash will not be sufficient, as step_1 will also depends on src. So we will think this is advance use scenario we need to support.

Very good call-out. I would ideally wish to import from src and then hash only on step_2. Hopefully this could be feasible.

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Mar 28, 2022
@luigiw
Copy link
Contributor

luigiw commented Oct 20, 2022

@cloga please add this feature request to the proper backlog. I'm closing this issue for now.

@luigiw luigiw closed this as completed Oct 20, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ADO Issue is documented on MSFT ADO for internal tracking customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. Machine Learning ML-Pipelines AreaPath needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

7 participants