[Feature Request] Bring back `azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...)` #19003

sergey-ivanchuk · 2021-05-28T19:53:14Z

Is your feature request related to a problem? Please describe.

For future releases, I'd like to see the return of an old, deprecated feature in the Azure Python SDK.

It would be great to use azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) . This parameter was depreciated a long time ago, but I feel it would benefit the Azure SDK user community.

Below is a use case I have, and a use case that's fairly practical for certain situations.

.
├── pipeline
│   ├── aml_process.py  # GOAL 2 -use  PythonScriptStep (allow_reuse=True , source_directory='./../',  script_name='./pipeline/step_1/math_check.py', hash_paths = './pipeline/step_1' …,… )
│   ├── step_1
│   │   └── math_check.py     # GOAL 1A - import from src/math.py & src/helper.py at runtime
│   └── step_2 
│       └── calculation.py.   # GOAL 1B - import from src/helper.py at runtime
├── requirements.txt
└── src
    ├── helper.py
    └── math.py

From my two goals above, I have them within a repository with source and pipeline code to run.

For goal 1 , I want to importsrc code. So, I need to make source_directory='./../' in the PythonScriptStep function

For goal 2, I want to use allow_reuse=True and hash_paths = './pipeline/step_1' so that I can do hashing on multiple sub-steps in a pipeline (e.g. use case where I need to re-run step_2 but still re-use step_1).

In reality, I might have 6 sub-steps in a repository. So, the value of hash_paths goes up greatly. Only re-running 1-of-6 steps is much better than re-running 6-of-6

Describe the solution you'd like

Un-depreciate azureml.pipeline.steps.python_script_step. PythonScriptStep(hash_paths= ...)

Describe alternatives you've considered

From my code snippet, I have considered splitting all code into two repositories (src and pipelines). This will meet my goal # 1 and goal # 2 from above. However, this will require more workarounds than I'd like to be responsible for. So, the code management side will be more than necessary .

azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) will give greater control and leverage for re-using certain pipeline steps.

Additional context
Nothing more to add.

The text was updated successfully, but these errors were encountered:

ghost · 2021-05-29T00:57:33Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.

Issue Details

Cross post from #18182 (comment)

Is your feature request related to a problem? Please describe.

For future releases, I'd like to see the return of an old, deprecated feature in the Azure Python SDK.

It would be great to use azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) . This parameter was depreciated a long time ago, but I feel it would benefit the Azure SDK user community.

Below is a use case I have, and a use case that's fairly practical for certain situations.

.
├── pipeline
│   ├── aml_process.py  # GOAL 2 -use  PythonScriptStep (allow_reuse=True , source_directory='./../',  script_name='./pipeline/step_1/math_check.py', hash_paths = './pipeline/step_1' …,… )
│   ├── step_1
│   │   └── math_check.py     # GOAL 1A - import from src/math.py & src/helper.py at runtime
│   └── step_2 
│       └── calculation.py.   # GOAL 1B - import from src/helper.py at runtime
├── requirements.txt
└── src
    ├── helper.py
    └── math.py

From my two goals above, I have them within a repository with source and pipeline code to run.

For goal 1 , I want to importsrc code. So, I need to make source_directory='./../' in the PythonScriptStep function

For goal 2, I want to use allow_reuse=True and hash_paths = './pipeline/step_1' so that I can do hashing on multiple sub-steps in a pipeline (e.g. use case where I need to re-run step_2 but still re-use step_1).

In reality, I might have 6 sub-steps in a repository. So, the value of hash_paths goes up greatly. Only re-running 1-of-6 steps is much better than re-running 6-of-6

Describe the solution you'd like

Un-depreciate azureml.pipeline.steps.python_script_step. PythonScriptStep(hash_paths= ...)

Describe alternatives you've considered

From my code snippet, I have considered splitting all code into two repositories (src and pipelines). This will meet my goal # 1 and goal # 2 from above. However, this will require more workarounds than I'd like to be responsible for. So, the code management side will be more than necessary .

azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) will give greater control and leverage for re-using certain pipeline steps.

Additional context
Nothing more to add.

Author:	sergey-ivanchuk
Assignees:	-
Labels:	`Machine Learning`, `Service Attention`, `customer-reported`, `feature-request`, `needs-triage`, `question`
Milestone:	-

xiangyan99 · 2021-05-29T00:57:46Z

Thanks for the feedback, we’ll investigate asap.

ghost · 2021-05-29T01:08:15Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shbijlan.

Issue Details

Cross post from #18182 (comment)

Is your feature request related to a problem? Please describe.

For future releases, I'd like to see the return of an old, deprecated feature in the Azure Python SDK.

It would be great to use azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) . This parameter was depreciated a long time ago, but I feel it would benefit the Azure SDK user community.

Below is a use case I have, and a use case that's fairly practical for certain situations.

.
├── pipeline
│   ├── aml_process.py  # GOAL 2 -use  PythonScriptStep (allow_reuse=True , source_directory='./../',  script_name='./pipeline/step_1/math_check.py', hash_paths = './pipeline/step_1' …,… )
│   ├── step_1
│   │   └── math_check.py     # GOAL 1A - import from src/math.py & src/helper.py at runtime
│   └── step_2 
│       └── calculation.py.   # GOAL 1B - import from src/helper.py at runtime
├── requirements.txt
└── src
    ├── helper.py
    └── math.py

From my two goals above, I have them within a repository with source and pipeline code to run.

For goal 1 , I want to importsrc code. So, I need to make source_directory='./../' in the PythonScriptStep function

For goal 2, I want to use allow_reuse=True and hash_paths = './pipeline/step_1' so that I can do hashing on multiple sub-steps in a pipeline (e.g. use case where I need to re-run step_2 but still re-use step_1).

In reality, I might have 6 sub-steps in a repository. So, the value of hash_paths goes up greatly. Only re-running 1-of-6 steps is much better than re-running 6-of-6

Describe the solution you'd like

Un-depreciate azureml.pipeline.steps.python_script_step. PythonScriptStep(hash_paths= ...)

Describe alternatives you've considered

From my code snippet, I have considered splitting all code into two repositories (src and pipelines). This will meet my goal # 1 and goal # 2 from above. However, this will require more workarounds than I'd like to be responsible for. So, the code management side will be more than necessary .

azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) will give greater control and leverage for re-using certain pipeline steps.

Additional context
Nothing more to add.

Author:	sergey-ivanchuk
Assignees:	-
Labels:	`ADO`, `ML-Pipelines`, `Machine Learning`, `Service Attention`, `customer-reported`, `feature-request`
Milestone:	-

navba-MSFT · 2022-03-25T11:07:02Z

@sergey-ivanchuk Apologies for the late reply. We are looking into this issue and we will provide an update once we have more details on this.

@bandsina @shbijlan @likebupt Could you please look into this and provide an update once you get a chance ? Awaiting your reply.

cloga · 2022-03-28T01:21:17Z

@sergey-ivanchuk Thanks for your feedback. This is a valid scenario. As we are developing new SDK version, I will add this request to the backlog. For this old SDK version, we will not do a new investment on it.

From my understanding, you will use a single big repo to manage the pipeline, and steps in it. And when you built pipeline and steps you will use root folder for this repo. By default, we will use the whole folder to calculate the code hash to decide re-use. In this scenario, step2 changes will impact the step1 re-use verse wise.

Provide capability to let customer provide the folders want to use for calculate code hash, will also introduce some issues, for example, in your case, only provide step_1 for hash will not be sufficient, as step_1 will also depends on src. So we will think this is advance use scenario we need to support.

sergey-ivanchuk · 2022-03-28T03:59:52Z

hi everyone, thanks for your recent follow-ups.

@cloga , follow-up comments below:

From my understanding, you will use a single big repo to manage the pipeline, and steps in it. And when you built pipeline and steps you will use root folder for this repo. By default, we will use the whole folder to calculate the code hash to decide re-use. In this scenario, step2 changes will impact the step1 re-use verse wise.

Yes, exactly.

Hypothetically, I could have a 5-step process and only want to re-run steps 5 (model training)

Provide capability to let customer provide the folders want to use for calculate code hash, will also introduce some issues, for example, in your case, only provide step_1 for hash will not be sufficient, as step_1 will also depends on src. So we will think this is advance use scenario we need to support.

Very good call-out. I would ideally wish to import from src and then hash only on step_2. Hopefully this could be feasible.

luigiw · 2022-10-20T23:58:20Z

@cloga please add this feature request to the proper backlog. I'm closing this issue for now.

xiangyan99 added feature-request This issue requires a new behavior in the product in order be resolved. Machine Learning Service Attention Workflow: This issue is responsible by Azure service team. labels May 29, 2021

ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label May 29, 2021

xiangyan99 removed the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label May 29, 2021

v-strudm-msft added ADO Issue is documented on MSFT ADO for internal tracking ML-Pipelines AreaPath labels May 29, 2021

lmazuel assigned bandsina Jan 25, 2022

navba-MSFT added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Mar 25, 2022

ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Mar 28, 2022

luigiw closed this as completed Oct 20, 2022

github-actions bot locked and limited conversation to collaborators Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Bring back `azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...)` #19003

[Feature Request] Bring back `azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...)` #19003

sergey-ivanchuk commented May 28, 2021

ghost commented May 29, 2021

xiangyan99 commented May 29, 2021

ghost commented May 29, 2021

navba-MSFT commented Mar 25, 2022

cloga commented Mar 28, 2022 •

edited

Loading

sergey-ivanchuk commented Mar 28, 2022

luigiw commented Oct 20, 2022

[Feature Request] Bring back azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) #19003

[Feature Request] Bring back azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...) #19003

Comments

sergey-ivanchuk commented May 28, 2021

ghost commented May 29, 2021

xiangyan99 commented May 29, 2021

ghost commented May 29, 2021

navba-MSFT commented Mar 25, 2022

cloga commented Mar 28, 2022 • edited Loading

sergey-ivanchuk commented Mar 28, 2022

luigiw commented Oct 20, 2022

[Feature Request] Bring back `azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...)` #19003

[Feature Request] Bring back `azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...)` #19003

cloga commented Mar 28, 2022 •

edited

Loading