Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch parquet library #19

Merged
merged 10 commits into from
Jul 11, 2023
Merged

Conversation

chelseajonesr
Copy link
Collaborator

  • Switch parquet library used for checkpointing from https://github.com/segmentio/parquet-go to https://github.com/xitongsys/parquet-go
  • (The previous library wrote nested optional strings in such a way that Spark 3.2 could not read the resulting parquet file)
  • Update all action struct members to be pointers so that the schema members will be optional, to match the schema produced by Rust and Spark
  • Update action parquet tags to match new library
  • When reading checkpoints, parse actions using reflect
  • Remove parsed fields from checkpoints for now
  • Set minimum writer version to 1

@chelseajonesr chelseajonesr force-pushed the switch-parquet-library branch from 935f063 to 686c5ca Compare July 7, 2023 20:40
…t. Remove timestamp logical type to match Spark schema. Remove Delta data type definitions.
@chelseajonesr chelseajonesr force-pushed the switch-parquet-library branch from 686c5ca to 0bf30fc Compare July 7, 2023 21:35
@jshiv jshiv merged commit 6167048 into rivian:main Jul 11, 2023
@chelseajonesr chelseajonesr deleted the switch-parquet-library branch July 12, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants