-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data analytics tutorial expansion project #8290
Conversation
* chenged the dag to load ghcn dataset * data preprocessing done * modified preprocessing * dataproc file added * code runs great * modifyed code based on Brad, still buggy * finished modifying, haven't sync wit hDAG * finished modifying DAG codes * ready for draft PR * pass lint * addressed Brad and Leah's comments * pass nox lint * pass nox lint * Fix: Retry CLI launch if needed (#8221) * Fix: add region tags * Fix: region tag typos * Fix: urlpatterns moved to end * Fix: typo * Fix: cli retries to fix flakiness * Fix: remove duplicate tags * Fix: use backoff for retries * Fix: lint import order error * address Leah's comments about typo and comments Co-authored-by: Charles Engelke <engelke@google.com>
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Submitted b/243976711 for CLA because override process linked from bot isn't working |
I've reached out to Brad to discuss if we can have this in another region. The CPU failures are flaky. Another alternative is to add a backoff for the "InvalidArgument" error Update: Just checked docs to validate that for a standard serverless dataproc request, it wants 12 CPUs. I'll move to another region I think |
I enabled privateIP access on northamerica-northeast1 but it's failing - running again to see if it took time for the enable operation to complete Update it worked! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor comment, otherwise LGTM!
composer/2022_airflow_summit/data_analytics_process_expansion.py
Outdated
Show resolved
Hide resolved
merging because snippet bot was passing previously |
This adds the more polished code from our intern's dataproc focused expansion project. This is an expansion of this tutorial that uses a similar but slightly altered DAG and a very different (more complex) Dataproc job to run. This PR adds a directory with
.txt
file of the data (not yet processed) and the processed.csv
of itIt also adds a line to header checker bot to ignore the license check on a text file
Checklist
nox -s py-3.9
(see Test Environment Setup)nox -s lint
(see Test Environment Setup)