-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generate Views on Prod Tables #291
Comments
cc @whd for a possible step after table creation |
We have a script for this now via #301. It would be ideal for us to add this to the schema deploy pipeline such that view schemas are updated and new views are created immediately after we update the underlying table schemas in prod. Is running the bigquery-etl docker image something we can do fairly easily on Jenkins? cc @whd @fbertsch @acmiyaguchi. If that's non-trivial, we can schedule this in Airflow to run once per night instead, at least as a shorter-term solution. |
It is. I can look at setting this up imminently. I might re-implement the |
I see the following on the latest BQ table deploy, which has the logic to implement view publishing.
I'm guessing the issue is that there was a manual copy of datasets from shared into derived-datasets that predates mozilla/mozilla-schema-generator#55. As far as I can tell fenix_nightly should be the first set of auto-generated and published views. If the solution here is to manually create this dataset (and keep derived-datasets and shared in sync, but hopefully we'll migrate fully to shared before any new namespaces are created) then once we do that and re-run the BQ table updates we can probably call this complete. EDIT: as a side note, performance is somewhat poor serially publishing hundreds of views, so re-implementing |
Yes, I think the way to go for the interim is to manually create the dataset. I just did so:
We should be good to try again.
There may be ways of expressing this in terraform that I'm unaware of, but I worry about the terraform approach being inflexible. We want to be able to pick up concrete view definitions from bigquery-etl's |
I hit an unrelated issue in mozilla/mozilla-schema-generator#69 but I'm confident this has been fixed.
I'm specifically referring to re-implementing the "publish" portion of this. There would be custom logic on top of terraform for creating the definitions, just as there is with ingestion tables. In the current case for defining ingestion datasets and tables the custom logic is a combination of the generated-schemas branch (input: file paths and schema definitions) and a wrapper script in cloudops-infra for manifesting that as terraform (output: tf json). I am noting that a similar approach could be used for the output (or absence of output) from this script and would have performance and perhaps tooling consistency benefits. |
This was rolled out in https://github.com/mozilla-services/cloudops-infra/pull/1294. We might revisit whether we want to generate views in stage as well, but for now we're creating them there on a "best effort" basis. |
Currently we have a view defined for every table in prod. We're proposing auto-generating these views instead. The code to auto-generate the views should live along-side the table deploys, which happens daily (once the
generated-schemas
branch is pushed to MPS).If a view exists here instead, then that view will override the default one; this will e.g. allow us to selectively update these views to handle new columns, data changes, or unions of versions.
New versions will be automatically pointed to by the view. If a union is needed with a previous version that will have to be done manually.
cc @jklukas
The text was updated successfully, but these errors were encountered: