Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1336] [Feature] allow setting networkUri and subnetworkUri for Dataproc Serverless batches #350

Closed
3 tasks done
lostmygithubaccount opened this issue Oct 12, 2022 · 3 comments · Fixed by #578
Closed
3 tasks done
Labels
feature:python-models good_first_issue Good for newcomers type:enhancement New feature or request

Comments

@lostmygithubaccount
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

In addition to the existing Dataproc Serverless settings, allow passing through the network and subnetworkUri fields in the job configuration: https://cloud.google.com/dataproc-serverless/docs/reference/rest/v1/projects.locations.batches#executionconfig

Describe alternatives you've considered

n/a

Who will this benefit?

dbt users on BigQuery running Python models with Dataproc Serverless. in particular, enterprise users with enterprise requirements

Are you interested in contributing this feature?

maybe; also a good first issue for anyone to pick up

Anything else?

we can provide guidance if needed

@github-actions github-actions bot changed the title [Feature] allow setting networkUri and subnetworkUri for Dataproc Serverless batches [CT-1336] [Feature] allow setting networkUri and subnetworkUri for Dataproc Serverless batches Oct 12, 2022
@lostmygithubaccount lostmygithubaccount added the good_first_issue Good for newcomers label Oct 12, 2022
@jtcohen6
Copy link
Contributor

IMO we should let users pass in whatever arbitrary execution configs they require, on a per-model basis, without trying to be too opinioned or clever with our validation.

The single config we're hard-coding now is completely arbitrary:

# should we make all of these spark/dataproc properties configurable?
# https://cloud.google.com/dataproc-serverless/docs/concepts/properties
# https://cloud.google.com/dataproc-serverless/docs/reference/rest/v1/projects.locations.batches#runtimeconfig
batch.runtime_config.properties = {
"spark.executor.instances": "2",
}

@Aylr
Copy link

Aylr commented Nov 1, 2022

@jtcohen6 Has there been any movement or additional design thought on this issue? I'm running up against it and eager to find a workaround or contribute to a fix here.

@lostmygithubaccount
Copy link
Author

hi @Aylr, it should be a fairly straightforward fix if it's something you'd like to contribute!

we won't have the capacity to get to it ourselves at least in the next month or so. we would need to decide if we could/should release a 1.3 patch release or wait for 1.4 to release this

if there's enough information in this issue already, feel free to open a PR and tag me -- we can get that reviewed and merged fairly quickly. let us know if you need more info to contribute this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment