-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(source-google-sheets): migrate low code #50843
feat(source-google-sheets): migrate low code #50843
Conversation
…xtractor that matches schema properties with values
…in components resolver in order to get an indexed object of properties where index is the order. - Add partition router to retriver in components resolver so we can slice.
- Remove old SourceClass to temp file to be deleted and - Remove unused transformations - Make extractor ha transform data from component mapping
…anges to discard empty cells in first row
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
/format-fix
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blocking concern and one question
airbyte-integrations/connectors/source-google-sheets/source_google_sheets/config_migrations.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-google-sheets/integration_tests/expected_records.txt
Show resolved
Hide resolved
… github.com:airbytehq/airbyte into aldogonzalez8/source-google-sheets/migrate-low-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling the comments! I'm good as test/CATs are passing and regression testing seems fine. I'm hyped for this one. Let's release it! 🚢
What
To migrate
source-google-sheets
connector relying ongoogle-api-python-client/google-auth-httplib2
to communicate with API to manifest using ourHttpRequester
and latest CDK features.Fixed: https://github.com/airbytehq/airbyte-internal-issues/issues/10962
How
Removed helper and client file dependencies to use new CDK features like
ComponentResolver
,DynamicSchema
andDynamicStreams
.Review guide
Components Resolver
The component resolver has its own
SimpleRetriever
with apartition_router
that depends on a static stream,get_spreadsheet_info_and_sheets
. This stream is responsible for requesting spreadsheet information and retrieving all the sheets.Request:
{{ config["spreadsheet_id"] }}?includeGridData=false&alt=json
Now, using the sheet IDs, we utilize the component resolver retriever/requester to obtain information, mainly the first row of data and the row count.
Request:
{{ config["spreadsheet_id"] }}?includeGridData=true&ranges={{ stream_partition.sheet_id }}!1:1&alt=json
With the response, we can start passing information using the
component_mapping
.Schema Loader
The schema loader has its own
SimpleRetriever
, which makes the following request:Request:
{{ config["spreadsheet_id"] }}?includeGridData=true&ranges={{ stream_partition.sheet_id }}!1:1&alt=json
Additionally, the schema loader has a custom record extractor,
DpathSchemaExtractor
, which is responsible for:Parsing sheet headers from the provided raw schema. This method assumes that the data is contiguous, i.e., every cell contains a value, and the first cell that does not contain a value denotes the end of the headers.
Performing name conversions.
Removing duplicated fields.
Dynamic Streams -
DpathSchemaMatchingExtractor
This extractor performs similar functions to the one in the
SchemaLoader
, but it also handles the matching between the schema (first row of the sheet) and the data (rows fetched), ensuring that record data is correctly assigned to the fields. A sample of the process can be found in the class description.RangePartitionRouter
Sheet data is fetched in ranges, such as
2-202, 203-303, 404-504
and so on, until the entire sheet is scanned. The router then creates slices for each range.User Impact
No
Can this PR be safely reverted and rolled back?