-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK/Components/PythonContainerOp - Simplified GCSHelper by extracting duplicate code #210
Conversation
468f44a
to
a24781a
Compare
pure_path = PurePath(gcs_path) | ||
gcs_bucket = pure_path.parts[1] | ||
gcs_blob = '/'.join(pure_path.parts[2:]) | ||
client = storage.Client() | ||
bucket = client.get_bucket(gcs_bucket) | ||
bucket = client.bucket(gcs_bucket) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use get_bucket, which would raise exception if the bucket does not exist.
bucket() simply creates a local object without verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception would be raised by the blob methods line .download_to_filename
, delete
. Also, methods like upload_from_filename do not require blob existence.
We can make the function private so that users do not see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this function can be private but the exception raised by blob operations does not return messages that are easy to understand compared to the get_bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried locally with
bucket = client.bucket(gcs_bucket)
blob = bucket.blob(gcs_blob)
blob.delete()
It would output error that are not intuitive when the bucket does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the error when I get in our hosted notebook when I use .get_bucket
instead of .bucket
:
bucket.reload(client=self)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/_helpers.py", line 108, in reload
_target_object=self)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET
https://www.googleapis.com/storage/v1/b/ml-pipeline-playground?projection=noAcl
: 420130321805-compute@developer.gserviceaccount.com does not have storage.buckets.get access to ml-pipeline-playground.
How about we just properly check if blob.exists():
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think get_bucket outputs very relevant messages. I still do not understand why you want to change to bucket(). Bucket() call does not involve a HTTP request, which does nothing but postpones the error. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason is permissions.
Let me try and find a way to get good error messages.
As per @gaoning777's request
/lgtm |
…-Simplified-GCSHelper-by-extracting-duplicate-code
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/lgtm |
…a projects (kubeflow#210) * Check in YAML files with current IAM policy for test and release infra projects. * This is the first step in modifying the current permissoins to grant our CI infrastructure permissions to push images to gcr.io/kubeflow-images-public Related to: kubeflow/testing#816 trigger docker build imagees on post-submit kubeflow/kubeflow#1574 use prow to auto push images. * Fix typo.
* Generated v0.0.1 release * Added a note on the readme for quick install * Fixed version name * Attach release tag to generate-install
…ENG-7310-New-CVE-Fix Upgrade go.mod package versions
This change is