Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synchronous execution option to workflow provisioning #990

Merged
merged 17 commits into from
Jan 16, 2025

Conversation

junweid62
Copy link
Contributor

@junweid62 junweid62 commented Jan 8, 2025

Jan 14 Revision
Added synchronous execution option to reprovision

Description

This PR introduces a new wait_for_completion_timeout feature to the Provision Workflow API in the OpenSearch Flow Framework. The feature allows users to control whether the API call waits for the entire workflow provisioning process to complete before returning a response.

What’s Changed:

  1. Added support for the wait_for_completion_timeout parameter in the REST layer (RestProvisionWorkflowAction).
  • Accepts a time duration value (e.g., 30s, 1m).
  • If the workflow is provisioned within the specified timeout, the API returns the created resources (same response as GetWorkflowStatus).
  • If the timeout is reached before provisioning completes, the API returns the workflow state without waiting further.
  1. Updated the transport layer (ProvisionWorkflowTransportAction) to handle the timeout logic and ensure correct behavior during synchronous provisioning.

Success Response:

{
    "workflow_id": "K13IR5QBEpCfUu_-AQdU",
    "state": "COMPLETED",
    "resources_created": [
        {
            "workflow_step_name": "create_connector",
            "workflow_step_id": "create_connector_1",
            "resource_id": "LF3IR5QBEpCfUu_-Awd_",
            "resource_type": "connector_id"
        },
        {
            "workflow_step_id": "register_model_2",
            "workflow_step_name": "register_remote_model",
            "resource_id": "L13IR5QBEpCfUu_-BQdI",
            "resource_type": "model_id"
        },
        {
            "workflow_step_name": "deploy_model",
            "workflow_step_id": "deploy_model_3",
            "resource_id": "L13IR5QBEpCfUu_-BQdI",
            "resource_type": "model_id"
        }
    ]
}

TimeOut Response:

{
    "workflow_id": "SmACR5QBdrR0lYdqgHa9",
    "state": "PROVISIONING",
    "resources_created": [
        {
            "workflow_step_name": "create_connector",
            "workflow_step_id": "create_connector_1",
            "resource_type": "connector_id",
            "resource_id": "S2ACR5QBdrR0lYdqgXYK"
        },
        {
            "workflow_step_name": "register_remote_model",
            "workflow_step_id": "register_model_2",
            "resource_type": "model_id",
            "resource_id": "TWACR5QBdrR0lYdqgXZ-"
        }
    ]
}

Areas of Concern:

I have a few parts of the implementation that I believe can be further improved, particularly in ProvisionWorkflowTransportAction. Some of the logic feels a bit verbose and might not be the most efficient way to handle the timeout and synchronous execution. I’d appreciate the feedback from reviewers.

Related Issues

Resolves #967

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.

  • You need to handle -1 time value; my recommendation is you use that for the default "async" rather than null
  • You need to do stream version checks for the new (optional) workflow state in the response, and the new timeout parameter in the workflow request (unless you want to just keep it in the params map).

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 45.34884% with 94 lines in your changes missing coverage. Please review.

Project coverage is 76.41%. Comparing base (5480cb4) to head (58b4f81).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
.../transport/ReprovisionWorkflowTransportAction.java 6.06% 29 Missing and 2 partials ⚠️
...rk/transport/ProvisionWorkflowTransportAction.java 9.09% 28 Missing and 2 partials ⚠️
...rch/flowframework/util/WorkflowTimeoutUtility.java 51.16% 19 Missing and 2 partials ⚠️
...ework/transport/CreateWorkflowTransportAction.java 76.47% 3 Missing and 1 partial ⚠️
...earch/flowframework/transport/WorkflowRequest.java 78.57% 0 Missing and 3 partials ⚠️
...ramework/transport/ReprovisionWorkflowRequest.java 71.42% 0 Missing and 2 partials ⚠️
...arch/flowframework/transport/WorkflowResponse.java 88.23% 0 Missing and 2 partials ⚠️
...h/flowframework/rest/RestCreateWorkflowAction.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #990      +/-   ##
============================================
- Coverage     77.57%   76.41%   -1.17%     
- Complexity      993     1010      +17     
============================================
  Files            99      100       +1     
  Lines          4714     4871     +157     
  Branches        431      453      +22     
============================================
+ Hits           3657     3722      +65     
- Misses          877      954      +77     
- Partials        180      195      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Junwei Dai added 13 commits January 15, 2025 10:53
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>

# Conflicts:
#	src/main/java/org/opensearch/flowframework/util/WorkflowTimeoutUtility.java
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>

# Conflicts:
#	src/test/java/org/opensearch/flowframework/workflow/DeleteConnectorStepTests.java
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
@junweid62 junweid62 force-pushed the provision-syncronosly branch from d6c0c53 to 18a1dbb Compare January 15, 2025 19:04
Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few suggestions.

Junwei Dai added 2 commits January 15, 2025 12:16
…, update error message

Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Copy link
Member

@joshpalis joshpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, great work implementing this feature @junweid62 . A few comments

Signed-off-by: Junwei Dai <junweid@amazon.com>
@dbwiddis dbwiddis merged commit 33579a3 into opensearch-project:main Jan 16, 2025
19 of 20 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/flow-framework/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/flow-framework/backport-2.x
# Create a new branch
git switch --create backport/backport-990-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 33579a35a9d7a72cc0a2d44561b98bd64a79dd04
# Push it to GitHub
git push --set-upstream origin backport/backport-990-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/flow-framework/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-990-to-2.x.

@opensearch-trigger-bot opensearch-trigger-bot bot added the backport-failed Applied to PRs when the automatic backport fails label Jan 16, 2025
@dbwiddis
Copy link
Member

@junweid62 can you please manually backport this?

junweid62 added a commit to junweid62/flow-framework that referenced this pull request Jan 17, 2025
…-project#990)

* Add synchronous execution option to workflow provisioning

Signed-off-by: Junwei Dai <junweid@amazon.com>

* code refactor

Signed-off-by: Junwei Dai <junweid@amazon.com>

* add change log

Signed-off-by: Junwei Dai <junweid@amazon.com>

* refactor code based on comment

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix spotless check

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Limit workflow timeout to a range of 1 to 300 seconds

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Limit workflow timeout to a range of 1 to 300 seconds

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Limit workflow timeout to non-negative

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Add synchronous execution to reprovision

Signed-off-by: Junwei Dai <junweid@amazon.com>

* remove unsued common value

Signed-off-by: Junwei Dai <junweid@amazon.com>

* add reprovision sync execution

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix test for WorkflowTimeoutUtilityTests

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix test name for WorkflowTimeoutUtilityTests

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Add comments to explain AtomicBoolean usage in WorkflowTimeoutUtility, update error message

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix spotless check

Signed-off-by: Junwei Dai <junweid@amazon.com>

* addressed some comments

Signed-off-by: Junwei Dai <junweid@amazon.com>

---------

Signed-off-by: Junwei Dai <junweid@amazon.com>
Co-authored-by: Junwei Dai <junweid@amazon.com>
(cherry picked from commit 33579a3)
junweid62 added a commit to junweid62/flow-framework that referenced this pull request Jan 17, 2025
…-project#990)

* Add synchronous execution option to workflow provisioning

Signed-off-by: Junwei Dai <junweid@amazon.com>

* code refactor

Signed-off-by: Junwei Dai <junweid@amazon.com>

* add change log

Signed-off-by: Junwei Dai <junweid@amazon.com>

* refactor code based on comment

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix spotless check

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Limit workflow timeout to a range of 1 to 300 seconds

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Limit workflow timeout to a range of 1 to 300 seconds

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Limit workflow timeout to non-negative

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Add synchronous execution to reprovision

Signed-off-by: Junwei Dai <junweid@amazon.com>

* remove unsued common value

Signed-off-by: Junwei Dai <junweid@amazon.com>

* add reprovision sync execution

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix test for WorkflowTimeoutUtilityTests

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix test name for WorkflowTimeoutUtilityTests

Signed-off-by: Junwei Dai <junweid@amazon.com>

* Add comments to explain AtomicBoolean usage in WorkflowTimeoutUtility, update error message

Signed-off-by: Junwei Dai <junweid@amazon.com>

* fix spotless check

Signed-off-by: Junwei Dai <junweid@amazon.com>

* addressed some comments

Signed-off-by: Junwei Dai <junweid@amazon.com>

---------

Signed-off-by: Junwei Dai <junweid@amazon.com>
Co-authored-by: Junwei Dai <junweid@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport PRs to 2.x branch backport-failed Applied to PRs when the automatic backport fails
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add option to provision synchronously
3 participants