Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK/Components/PythonContainerOp - Simplified GCSHelper by extracting duplicate code #210

Conversation

Ark-kun
Copy link
Contributor

@Ark-kun Ark-kun commented Nov 12, 2018

This change is Reviewable

@Ark-kun Ark-kun force-pushed the avolkov/SDK/Components/PythonContainerOp---Simplified-GCSHelper-by-extracting-duplicate-code branch from 468f44a to a24781a Compare November 12, 2018 04:08
@coveralls
Copy link

coveralls commented Nov 12, 2018

Pull Request Test Coverage Report for Build 307

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 67.506%

Totals Coverage Status
Change from base Build 225: 0.0%
Covered Lines: 1535
Relevant Lines: 2182

💛 - Coveralls

pure_path = PurePath(gcs_path)
gcs_bucket = pure_path.parts[1]
gcs_blob = '/'.join(pure_path.parts[2:])
client = storage.Client()
bucket = client.get_bucket(gcs_bucket)
bucket = client.bucket(gcs_bucket)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use get_bucket, which would raise exception if the bucket does not exist.
bucket() simply creates a local object without verification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception would be raised by the blob methods line .download_to_filename, delete. Also, methods like upload_from_filename do not require blob existence.
We can make the function private so that users do not see it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this function can be private but the exception raised by blob operations does not return messages that are easy to understand compared to the get_bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried locally with
bucket = client.bucket(gcs_bucket)
blob = bucket.blob(gcs_blob)
blob.delete()
It would output error that are not intuitive when the bucket does not exist.

Copy link
Contributor Author

@Ark-kun Ark-kun Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the error when I get in our hosted notebook when I use .get_bucket instead of .bucket:

    bucket.reload(client=self)
  File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/_helpers.py", line 108, in reload
    _target_object=self)
  File "/usr/local/lib/python3.5/dist-packages/google/cloud/_http.py", line 293, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET 
https://www.googleapis.com/storage/v1/b/ml-pipeline-playground?projection=noAcl
: 420130321805-compute@developer.gserviceaccount.com does not have storage.buckets.get access to ml-pipeline-playground.

How about we just properly check if blob.exists(): ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_bucket outputs very relevant messages. I still do not understand why you want to change to bucket(). Bucket() call does not involve a HTTP request, which does nothing but postpones the error. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is permissions.

Let me try and find a way to get good error messages.

@gaoning777 gaoning777 self-assigned this Nov 30, 2018
@gaoning777
Copy link
Contributor

/lgtm

…-Simplified-GCSHelper-by-extracting-duplicate-code
@k8s-ci-robot k8s-ci-robot removed the lgtm label Dec 1, 2018
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Dec 1, 2018

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@qimingj
Copy link
Contributor

qimingj commented Dec 2, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 8a9e205 into master Dec 2, 2018
@gaoning777
Copy link
Contributor

/lgtm

@Ark-kun Ark-kun deleted the avolkov/SDK/Components/PythonContainerOp---Simplified-GCSHelper-by-extracting-duplicate-code branch January 15, 2019 22:09
Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023
…a projects (kubeflow#210)

* Check in YAML files with current IAM policy for test and release infra projects.

* This is the first step in modifying the current permissoins to grant
  our CI infrastructure permissions to push images to gcr.io/kubeflow-images-public

Related to:
kubeflow/testing#816 trigger docker build imagees on post-submit
kubeflow/kubeflow#1574 use prow to auto push images.

* Fix typo.
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
* Generated v0.0.1 release

* Added a note on the readme for quick install

* Fixed version name

* Attach release tag to generate-install
HumairAK referenced this pull request in red-hat-data-services/data-science-pipelines Mar 11, 2024
m-rafeeq referenced this pull request in m-rafeeq/data-science-pipelines Jan 21, 2025
…ENG-7310-New-CVE-Fix

Upgrade go.mod package versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants