Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data analytics tutorial expansion project #8290

Merged
merged 61 commits into from
Sep 28, 2022
Merged

Conversation

leahecole
Copy link
Collaborator

@leahecole leahecole commented Aug 26, 2022

This adds the more polished code from our intern's dataproc focused expansion project. This is an expansion of this tutorial that uses a similar but slightly altered DAG and a very different (more complex) Dataproc job to run. This PR adds a directory with

  • A new DAG + test
  • A new Dataproc job and test
  • A helper script (and test) that converts data to the format it needs to be in to run the tutorial
  • The .txt file of the data (not yet processed) and the processed .csv of it
  • A README that will be linked from the original tutorial with a note for users that may be interested in a more complex Dataproc job
    It also adds a line to header checker bot to ignore the license check on a text file

Checklist

kaiyang-code and others added 7 commits August 25, 2022 09:30
* chenged the dag to load ghcn dataset

* data preprocessing done

* modified preprocessing

* dataproc file added

* code runs great

* modifyed code based on Brad, still buggy

* finished modifying, haven't sync wit hDAG

* finished modifying DAG codes

* ready for draft PR

* pass lint

* addressed Brad and Leah's comments

* pass nox lint

* pass nox lint

* Fix: Retry CLI launch if needed (#8221)

* Fix: add region tags

* Fix: region tag typos

* Fix: urlpatterns moved to end

* Fix: typo

* Fix: cli retries to fix flakiness

* Fix: remove duplicate tags

* Fix: use backoff for retries

* Fix: lint import order error

* address Leah's comments about typo and comments

Co-authored-by: Charles Engelke <engelke@google.com>
@leahecole leahecole requested a review from bradmiro August 26, 2022 22:54
@leahecole leahecole requested review from rachael-ds, rafalbiegacz and a team as code owners August 26, 2022 22:54
@google-cla
Copy link

google-cla bot commented Aug 26, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Aug 26, 2022
@leahecole leahecole marked this pull request as draft August 26, 2022 22:57
@leahecole leahecole added the cla: yes This human has signed the Contributor License Agreement. label Aug 26, 2022
@google-cla google-cla bot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Aug 26, 2022
@leahecole
Copy link
Collaborator Author

leahecole commented Aug 26, 2022

Submitted b/243976711 for CLA because override process linked from bot isn't working

@leahecole
Copy link
Collaborator Author

leahecole commented Sep 28, 2022

I've reached out to Brad to discuss if we can have this in another region. The CPU failures are flaky. Another alternative is to add a backoff for the "InvalidArgument" error

Update: Just checked docs to validate that for a standard serverless dataproc request, it wants 12 CPUs. I'll move to another region I think

@leahecole leahecole added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 28, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 28, 2022
@leahecole leahecole added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 28, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 28, 2022
@leahecole leahecole added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 28, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 28, 2022
@leahecole
Copy link
Collaborator Author

leahecole commented Sep 28, 2022

I enabled privateIP access on northamerica-northeast1 but it's failing - running again to see if it took time for the enable operation to complete

Update it worked!

Copy link
Collaborator

@dandhlee dandhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment, otherwise LGTM!

@google-cla google-cla bot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Sep 28, 2022
@leahecole leahecole added the automerge Merge the pull request once unit tests and other checks pass. label Sep 28, 2022
@google-cla google-cla bot added cla: no This human has *not* signed the Contributor License Agreement. cla: yes This human has signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. cla: no This human has *not* signed the Contributor License Agreement. labels Sep 28, 2022
@leahecole leahecole added the snippet-bot:force-run Force snippet-bot runs its logic label Sep 28, 2022
@snippet-bot snippet-bot bot removed the snippet-bot:force-run Force snippet-bot runs its logic label Sep 28, 2022
@leahecole leahecole merged commit 5310f9e into main Sep 28, 2022
@leahecole leahecole deleted the kaiyang_expansion_project branch September 28, 2022 18:22
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Sep 28, 2022
@leahecole
Copy link
Collaborator Author

merging because snippet bot was passing previously

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.