Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(source-google-sheets): migrate low code #50843

Merged
merged 75 commits into from
Jan 30, 2025

Conversation

aldogonzalez8
Copy link
Contributor

@aldogonzalez8 aldogonzalez8 commented Dec 31, 2024

What

To migrate source-google-sheets connector relying on google-api-python-client/google-auth-httplib2 to communicate with API to manifest using our HttpRequester and latest CDK features.

Fixed: https://github.com/airbytehq/airbyte-internal-issues/issues/10962

How

Removed helper and client file dependencies to use new CDK features like ComponentResolver, DynamicSchema and DynamicStreams.

Review guide

Components Resolver

The component resolver has its own SimpleRetriever with a partition_router that depends on a static stream, get_spreadsheet_info_and_sheets. This stream is responsible for requesting spreadsheet information and retrieving all the sheets.
Request: {{ config["spreadsheet_id"] }}?includeGridData=false&alt=json
Now, using the sheet IDs, we utilize the component resolver retriever/requester to obtain information, mainly the first row of data and the row count.
Request: {{ config["spreadsheet_id"] }}?includeGridData=true&ranges={{ stream_partition.sheet_id }}!1:1&alt=json

With the response, we can start passing information using the component_mapping.

Schema Loader

The schema loader has its own SimpleRetriever, which makes the following request:
Request: {{ config["spreadsheet_id"] }}?includeGridData=true&ranges={{ stream_partition.sheet_id }}!1:1&alt=json

Additionally, the schema loader has a custom record extractor, DpathSchemaExtractor, which is responsible for:
Parsing sheet headers from the provided raw schema. This method assumes that the data is contiguous, i.e., every cell contains a value, and the first cell that does not contain a value denotes the end of the headers.
Performing name conversions.
Removing duplicated fields.

Dynamic Streams - DpathSchemaMatchingExtractor

This extractor performs similar functions to the one in the SchemaLoader, but it also handles the matching between the schema (first row of the sheet) and the data (rows fetched), ensuring that record data is correctly assigned to the fields. A sample of the process can be found in the class description.

RangePartitionRouter

Sheet data is fetched in ranges, such as 2-202, 203-303, 404-504 and so on, until the entire sheet is scanned. The router then creates slices for each range.

User Impact

No

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

@aldogonzalez8 aldogonzalez8 self-assigned this Dec 31, 2024
Copy link

vercel bot commented Dec 31, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 30, 2025 7:36pm

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jan 2, 2025
@aldogonzalez8
Copy link
Contributor Author

aldogonzalez8 commented Jan 2, 2025

/format-fix

Format-fix job started... Check job output.

✅ Changes applied successfully. (5876dc9)

@lazebnyi lazebnyi requested review from maxi297 and lazebnyi January 30, 2025 01:18
@lazebnyi lazebnyi requested review from darynaishchenko and removed request for a team January 30, 2025 01:20
@lazebnyi
Copy link
Collaborator

lazebnyi commented Jan 30, 2025

note: I'll change CDK version to prod version after this PR airbytehq/airbyte-python-cdk#293 will be released

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking concern and one question

@lazebnyi lazebnyi requested a review from maxi297 January 30, 2025 16:04
… github.com:airbytehq/airbyte into aldogonzalez8/source-google-sheets/migrate-low-code
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling the comments! I'm good as test/CATs are passing and regression testing seems fine. I'm hyped for this one. Let's release it! 🚢

@lazebnyi lazebnyi merged commit a4f6530 into master Jan 30, 2025
30 checks passed
@lazebnyi lazebnyi deleted the aldogonzalez8/source-google-sheets/migrate-low-code branch January 30, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/google-sheets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants