Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MERGE AFTER SDK #159] Adding GCS dependency for backend and prompt service #1106

Closed
wants to merge 9 commits into from

Conversation

harini-venkataraman
Copy link
Contributor

What

Adding dependency version needed for Backend and prompt service.
...

Why

This will be needed while the integrator uses fsspec. Corresponding file storage dependency is to be added.
...

How

Pinned the dependency in toml.
...

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, these are additions of dependency
...

Database Migrations

Not applicable.
...

Env Config

Not applicable.
...

Relevant Docs

Related Issues or PRs

Not applicable.
...

Dependencies Versions

Not applicable.
...

Notes on Testing

In backend, this is already present as a transitive dependency through connectors.
Pinned the version.
...

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

@@ -33,6 +33,7 @@ dependencies = [
"social-auth-app-django==5.3.0", # For OAuth
"social-auth-core==4.4.2", # For OAuth
"unstract-sdk~=0.56.0rc4",
"gcsfs==2024.6.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harini-venkataraman won't we need the dependencies for Azure and S3 if this is needed? @gaya3-zipstack @hari-kuriakose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaya3-zipstack I got a doubt. Shouldn't we be adding this in the SDK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritwik-g Adding it in SDK is a baggage for SDK as we are only using fsspec APIs. The integrator can decide what filesystem is required and cann the dependencies accordingly.
Eg, tools need minio and not GCS. Hence adding GCS in SDK will be unnecessary for tools.
However we could use a selective way to install SDK like explicitly mention what we need to be added. We could take it up later....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay but let's make sure to add the Azure and S3 dependencies as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 and azure is not used in backend yet, and we can add it when we make the integration.

Copy link
Contributor

@ritwik-g ritwik-g Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harini-venkataraman @gaya3-zipstack we will be requiring it as soon as we release it to production for on-prem customers. This is not a separate requirement. S3, GCS and Azure storage support for all 3 needs to be present. Testing wise we can test it later. So my suggestion would be to take care of this so that the effort to make it on-prem ready will be minimal. Where ever we need google storage we will need the S3 and Azure as well there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way this is what I feel. If you think it is better to proceed without the same please go ahead.

Copy link
Contributor Author

@harini-venkataraman harini-venkataraman Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1106 (comment)
Adding the dependencies as optional group from SDK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest, we address azure, once integration is stable
CC @gaya3-zipstack

ritwik-g and others added 2 commits February 3, 2025 09:20
Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Signed-off-by: Gayathri <142381512+gaya3-zipstack@users.noreply.github.com>
Copy link
Contributor

@hari-kuriakose hari-kuriakose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harini-venkataraman If gcsfs and s3fs etc are required only for code in SDK, then it will not be good for the long term to include them as direct dependencies downstream.

However, keeping packages for all cloud providers as default dependencies in SDK will also lead to bloating.

Thus I suggest that we leverage optional dependencies in SDK pyproject.toml to have a section like the following:

[project.optional-dependencies]
aws = [
  "s3fs~=2024.10.0"
]
azure = [
]
gcp = [
  "gcsfs~=2024.6"
]

This will allow downstream services to install SDK selectively by having dependencies like unstract-sdk, unstract-sdk[gcp], etc.

NOTE: It is relatively OK to release with only GCP support first.

cc @ritwik-g @gaya3-zipstack @muhammad-ali-e

@harini-venkataraman
Copy link
Contributor Author

@harini-venkataraman If gcsfs and s3fs etc are required only for code in SDK, then it will not be good for the long term to include them as direct dependencies downstream.

However, keeping packages for all cloud providers as default dependencies in SDK will also lead to bloating.

Thus I suggest that we leverage optional dependencies in SDK pyproject.toml to have a section like the following:

[project.optional-dependencies]
aws = [
  "s3fs~=2024.10.0"
]
azure = [
]
gcp = [
  "gcsfs~=2024.6"
]

This will allow downstream services to install SDK selectively by having dependencies like unstract-sdk, unstract-sdk[gcp], etc.

NOTE: It is relatively OK to release with only GCP support first.

cc @ritwik-g @gaya3-zipstack @muhammad-ali-e

@hari-kuriakose Thanks for suggesting this.
Have made the requested changes in Zipstack/unstract-sdk#159

Copy link
Contributor

github-actions bot commented Feb 6, 2025

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{9}}$$ $$\textcolor{#23d18b}{\tt{9}}$$

Copy link

sonarqubecloud bot commented Feb 6, 2025

@harini-venkataraman harini-venkataraman changed the title Adding GCS dependency for backend and prompt service [MERGE WITH SDK #159] Adding GCS dependency for backend and prompt service Feb 6, 2025
@harini-venkataraman harini-venkataraman changed the title [MERGE WITH SDK #159] Adding GCS dependency for backend and prompt service [MERGE AFTER SDK #159] Adding GCS dependency for backend and prompt service Feb 6, 2025
"PyDrive2[fsspec]==1.15.4", # For GDrive
"oauth2client==4.1.3", # For GDrive
"dropboxdrivefs==1.4.1", # For Dropbox
"boxfs==0.2.1", # For Box
"gcsfs==2024.6.0", # For GoogleCloudStorage
"gcsfs==2024.6", # For GoogleCloudStorage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"gcsfs==2024.6", # For GoogleCloudStorage
"gcsfs~=2024.6", # For GoogleCloudStorage

@@ -14,12 +14,12 @@ dependencies = [
"google-cloud-secret-manager==2.16.1",
"google-cloud-storage==2.9.0",
# Filesystem connectors
"s3fs[boto3]==2024.6.0", # For Minio
"s3fs[boto3]==2024.10.0", # For Minio
Copy link
Contributor

@ritwik-g ritwik-g Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"s3fs[boto3]==2024.10.0", # For Minio
"s3fs[boto3]~=2024.10", # For Minio

@@ -3,4 +3,4 @@
# Required for all unstract tools
unstract-sdk~=0.56.0rc4
# Required for remote storage support
s3fs[boto3]==2024.6.0
s3fs[boto3]~=2024.10.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s3fs[boto3]~=2024.10.0
s3fs[boto3]~=2024.10

@@ -3,4 +3,4 @@
# Required for all unstract tools
unstract-sdk~=0.56.0rc4
# Required for remote storage support
s3fs[boto3]==2024.6.0
s3fs[boto3]~=2024.10.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s3fs[boto3]~=2024.10.0
s3fs[boto3]~=2024.10

@@ -3,4 +3,4 @@
# Required for all unstract tools
unstract-sdk~=0.56.0rc4
# Required for remote storage support
s3fs[boto3]==2024.6.0
s3fs[boto3]~=2024.10.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s3fs[boto3]~=2024.10.0
s3fs[boto3]~=2024.10

Copy link
Contributor

@ritwik-g ritwik-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added few suggestions. But include it or not I will leave it up to you folks

@harini-venkataraman
Copy link
Contributor Author

The dependencies have been moved to SDK as an optional dep - Zipstack/unstract-sdk#159, and other changes are cherrypicked to #1124
Addressed the review comments in the corresponding PRs, and support for Azure will be handled once the current versio is stable. Closing the PR with this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants