-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BAD-308: schema and BQ description updates #116
Conversation
Codecov ReportBase: 94.83% // Head: 94.86% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #116 +/- ##
===========================================
+ Coverage 94.83% 94.86% +0.02%
===========================================
Files 23 23
Lines 2635 2650 +15
Branches 340 340
===========================================
+ Hits 2499 2514 +15
Misses 70 70
Partials 66 66
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
075b326
to
66f7f98
Compare
Reopening this PR because I've changed the names of some of the schema files (removed dates). This changed is blocked by The-Academic-Observatory/observatory-platform#589 |
…est folders. Forced schema upload when creating tables for the onix workflows
1473ddc
to
777bdd5
Compare
All tests passing locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Keegan, looks good, I think we can change a few of the schemas.
Are these used?
- book_2021-11-25.json
- book_product_2021-11-25.json
- book_product_2022-03-31.json
These are partitioned tables so can have the dates removed:
- doab_2021-01-01.json
- ucl_discovery_2008-01-01.json
- oapen_metadata_2018-05-14.json
These can have the dates removed as they are output tables:
- onix_aggregate_metrics_2021-11-25.json
- onix_invalid_isbn_2022-03-27.json
- platform_invalid_isbn_2022-03-27.json
- platform_unmatched_isbn_2022-03-27.json
Can we just keep the latest schema for the various oaebu_public_data and oaebu_publisher_book_product schemas and remove the dates because they are output tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Keegan, looks good.
The schema files can differ from the BQ tables since they are not used in their creation. This has led to the divergence of the BQ table schema and the documentation schema.
This update uses the PR from The-Academic-Observatory/observatory-platform#589 to make use of the schema.json files when creating the BQ tables. This accomplishes two desirable things:
As a result, the workflows will now issue an error when a SQL query creates a table that does not fit the expected schema. You may still neglect to supply a schema file, which will not raise any issues and upload the data with an implied schema.