Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAD-308: schema and BQ description updates #116

Merged
merged 2 commits into from
Nov 25, 2022

Conversation

keegansmith21
Copy link
Contributor

@keegansmith21 keegansmith21 commented Nov 2, 2022

The schema files can differ from the BQ tables since they are not used in their creation. This has led to the divergence of the BQ table schema and the documentation schema.

This update uses the PR from The-Academic-Observatory/observatory-platform#589 to make use of the schema.json files when creating the BQ tables. This accomplishes two desirable things:

  • Synchronicity of the documentation with reality, where the schema.json files act as a singular source of truth
  • Descriptions from the schema.json files will be pushed to the BQ tables

As a result, the workflows will now issue an error when a SQL query creates a table that does not fit the expected schema. You may still neglect to supply a schema file, which will not raise any issues and upload the data with an implied schema.

@codecov
Copy link

codecov bot commented Nov 2, 2022

Codecov Report

Base: 94.83% // Head: 94.86% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (bf8f153) compared to base (ddde9b0).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #116      +/-   ##
===========================================
+ Coverage    94.83%   94.86%   +0.02%     
===========================================
  Files           23       23              
  Lines         2635     2650      +15     
  Branches       340      340              
===========================================
+ Hits          2499     2514      +15     
  Misses          70       70              
  Partials        66       66              
Impacted Files Coverage Δ
oaebu_workflows/workflows/oapen_workflow.py 92.73% <100.00%> (+0.12%) ⬆️
oaebu_workflows/workflows/onix_telescope.py 91.54% <100.00%> (+0.05%) ⬆️
oaebu_workflows/workflows/onix_workflow.py 92.86% <100.00%> (+0.12%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@keegansmith21
Copy link
Contributor Author

Reopening this PR because I've changed the names of some of the schema files (removed dates).

This changed is blocked by The-Academic-Observatory/observatory-platform#589

…est folders. Forced schema upload when creating tables for the onix workflows
@keegansmith21 keegansmith21 force-pushed the BAD-308_update_readthedocs branch from 1473ddc to 777bdd5 Compare November 18, 2022 03:02
@keegansmith21
Copy link
Contributor Author

All tests passing locally

@keegansmith21 keegansmith21 requested a review from jdddog November 18, 2022 04:55
Copy link
Contributor

@jdddog jdddog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Keegan, looks good, I think we can change a few of the schemas.

Are these used?

  • book_2021-11-25.json
  • book_product_2021-11-25.json
  • book_product_2022-03-31.json

These are partitioned tables so can have the dates removed:

  • doab_2021-01-01.json
  • ucl_discovery_2008-01-01.json
  • oapen_metadata_2018-05-14.json

These can have the dates removed as they are output tables:

  • onix_aggregate_metrics_2021-11-25.json
  • onix_invalid_isbn_2022-03-27.json
  • platform_invalid_isbn_2022-03-27.json
  • platform_unmatched_isbn_2022-03-27.json

Can we just keep the latest schema for the various oaebu_public_data and oaebu_publisher_book_product schemas and remove the dates because they are output tables?

@jdddog jdddog self-requested a review November 25, 2022 00:38
Copy link
Contributor

@jdddog jdddog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Keegan, looks good.

@jdddog jdddog marked this pull request as ready for review November 25, 2022 00:39
@jdddog jdddog merged commit 68b0781 into develop Nov 25, 2022
@jdddog jdddog deleted the BAD-308_update_readthedocs branch November 25, 2022 00:39
@keegansmith21 keegansmith21 mentioned this pull request Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants