Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable BQ streaming inserts for particular docTypes #588

Closed
jklukas opened this issue May 6, 2019 · 3 comments
Closed

Enable BQ streaming inserts for particular docTypes #588

jklukas opened this issue May 6, 2019 · 3 comments
Labels
enhancement New feature or request ingestion-beam This applies to ingestion-beam

Comments

@jklukas
Copy link
Contributor

jklukas commented May 6, 2019

Currently, we can run gcp-ingestion as BigQuery sink either in streaming mode or in file_loads mode. The entire job follows one path or the other.

I'd like to add a mixed mode that would operate mostly like the existing file_loads mode, but would send a subset of messages through the streaming path based on docType.

We'd add a --bqStreamingDocTypes runtime configuration option that we would set to:

bqStreamingDocTypes=activity-stream/impression-stats,activity-stream/spoc-fills

to allow the Pocket team to realize some low-latency use cases.

@jklukas jklukas added enhancement New feature or request ingestion-beam This applies to ingestion-beam labels May 6, 2019
jklukas added a commit that referenced this issue May 8, 2019
jklukas added a commit that referenced this issue May 8, 2019
jklukas added a commit that referenced this issue May 9, 2019
@jklukas
Copy link
Contributor Author

jklukas commented May 9, 2019

The PR for this is merged. To put this into action we would want to update the BQ Sink template with:

--bqWriteMethod=mixed

And then for the structured-decoded_bq-sink, set runtime parameter:

bqStreamingDocTypes=activity-stream/impression-stats,activity-stream/spoc_fills,mobile/activation

@jklukas
Copy link
Contributor Author

jklukas commented May 9, 2019

cc @whd

jklukas added a commit that referenced this issue May 9, 2019
jklukas added a commit that referenced this issue May 9, 2019
@jklukas
Copy link
Contributor Author

jklukas commented May 9, 2019

We got hit by GoogleCloudPlatform/DataflowJavaSDK#632

Some more undocumented restricitons of runtime parameters.

@jklukas jklukas closed this as completed Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ingestion-beam This applies to ingestion-beam
Projects
None yet
Development

No branches or pull requests

1 participant