Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document dataproc_batch field for configuring Dataproc Serverless jobs #3661

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions website/docs/docs/build/python-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -679,6 +679,8 @@ models:
submission_method: serverless
```

Python models running on Dataproc Serverless can be further configured in your [BigQuery profile](/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc).

Any user or service account that runs dbt Python models will need the following permissions(in addition to the required BigQuery permissions) ([docs](https://cloud.google.com/dataproc/docs/concepts/iam/iam)):
```
dataproc.batches.create
Expand Down
34 changes: 33 additions & 1 deletion website/docs/docs/core/connect-data-platform/bigquery-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -501,12 +501,44 @@ my-profile:
project: abc-123
dataset: my_dataset

# for dbt Python models
# for dbt Python models to be run on a Dataproc cluster
gcs_bucket: dbt-python
dataproc_cluster_name: dbt-python
dataproc_region: us-central1
```

Alternatively, Dataproc Serverless can be used:

```yaml
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset

# for dbt Python models to be run on Dataproc Serverless
gcs_bucket: dbt-python
dataproc_region: us-central1
submission_method: serverless
dataproc_batch:
environment_config:
execution_config:
service_account: dbt@abc-123.iam.gserviceaccount.com
subnetwork_uri: regions/us-central1/subnetworks/dataproc-dbt
labels:
project: my-project
role: dev
runtime_config:
properties:
spark.executor.instances: 3
spark.driver.memory: 1g
```

For a full list of possible configuration fields that can be passed in `dataproc_batch`, refer to the [Dataproc Serverless Batch](https://cloud.google.com/dataproc-serverless/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.Batch) documentation.

</VersionBlock>

## Required permissions
Expand Down