-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add CustomSchemaNormalization
#194
Conversation
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
CustomSchemaNormalization
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nits and one question, otherwise seems pretty straightforward!
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
📝 WalkthroughWalkthroughThis pull request introduces a new Changes
Sequence DiagramsequenceDiagram
participant RecordSelector
participant SchemaTransformer
participant CustomNormalization
RecordSelector->>SchemaTransformer: Check normalization type
alt Standard Normalization
SchemaTransformer-->>RecordSelector: Apply standard normalization
else Custom Normalization
RecordSelector->>CustomNormalization: Instantiate custom normalization
CustomNormalization-->>RecordSelector: Apply custom normalization strategy
end
Possibly Related PRs
Suggested Reviewers
Hey there! 👋 I noticed a few things that might be worth discussing:
Feel free to share your thoughts on these suggestions! 🚀 Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
airbyte_cdk/sources/declarative/extractors/record_selector.py (2)
17-17
: Consider clarifying the import usage.You’re importing
TypeTransformer
here but it’s not referenced until much later. Would consolidating related imports help readability and keep them closer to the usage site, wdyt?
Line range hint
2012-2015
: Add a safety check for unsupported modes.In this conditional expression, we assume
model.schema_normalization
must be eitherSchemaNormalizationModel
or a custom type. Would it help to explicitly handle unknown modes (e.g., raising a descriptive exception), to avoid silent misconfigurations, wdyt?airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
Line range hint
187-189
: Confirm the naming alignment.We’re aliasing
CustomSchemaNormalization
toCustomSchemaNormalizationModel
here. Would it be clearer if we kept consistent naming across all imports, e.g., dropping “Model” to preserve brevity and clarity, wdyt?
1532-1536
: Question about the default.The
Field
default forschema_normalization
is set toSchemaNormalization.None_
. Did you intend to freely toggle betweenNone
and a custom transformer? If so, do you think an explicitNone
might be clearer, wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
500-500
: Request more clarify around custom component creation.You’re adding
CustomSchemaNormalizationModel
tocreate_custom_component
. Could we ensure it enforces an interface likeTypeTransformer
? This might reduce future confusion, wdyt?
2012-2015
: Evaluate future extension for merging transformations.
TypeTransformer
or aCustomSchemaNormalizationModel
is chosen. One day we might combine standard transformations with custom transformations. Is that something you’d consider supporting in your logic, wdyt?airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
670-691
: Alphabetical order for consistency.We introduced
CustomSchemaNormalization
near line 670. Could we keep alphabetical ordering (like other custom components) to make searching simpler, wdyt?
2580-2584
: Improve clarity around “anyOf” usage.We allow either a standard
SchemaNormalization
orCustomSchemaNormalization
inRecordSelector
. Would clarifying in the schema that “None” is the baseline default help reduce confusion for integrators, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/extractors/record_selector.py
(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
Line range hint
100-102
: Validate usage ofCustomStateMigration
.Would it be useful to verify that
CustomStateMigration
indeed inherits from a recognized migration interface or base class to prevent runtime errors, wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
447-451
: Suggest verifying mapping coverage.You’ve reintroduced
SCHEMA_TRANSFORMER_TYPE_MAPPING
forSchemaNormalizationModel
. Would it be safer to confirm all possible enum values inSchemaNormalizationModel
are covered here to avoid mismatches, wdyt?
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/extractors/type_transformer.py (2)
10-11
: Using @DataClass on an abstract class
Seems elegant. Are you sure you need dataclass features for an ABC that currently has no fields? If a future extension is planned, this is fine, otherwise a simple ABC might suffice. wdyt?
35-55
: Potential return of the transformed record
Currently, the method does not return a record, but modifies it in place. Would returning a new record be clearer, or is in-place mutation the intended design? wdyt?airbyte_cdk/sources/declarative/extractors/record_selector.py (1)
13-13
: Importing AbstractTypeTransformer
Great to see the usage of AbstractTypeTransformer. Any thoughts on adding a brief mention in the class docstring to clarify that both TypeTransformer and AbstractTypeTransformer can be provided? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/extractors/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/record_selector.py
(2 hunks)airbyte_cdk/sources/declarative/extractors/type_transformer.py
(1 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/extractors/__init__.py (2)
12-12
: Good addition of AbstractTypeTransformer import.
Nice step to unify type transformation strategies. Would it be helpful to add a short comment here indicating its primary usage? wdyt?
15-15
: Exporting AbstractTypeTransformer in all
This makes it publicly available, which is great. Might be worth ensuring that external users are guided toward this new abstraction. wdyt?airbyte_cdk/sources/declarative/extractors/record_selector.py (1)
37-37
: Union type for schema_normalization
Allowing both TypeTransformer and AbstractTypeTransformer is flexible. Are there any potential pitfalls with inconsistent method signatures between them? Maybe reinforcing type hints or documentation could help. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about the late approval and review. just a few small things but nothing to block on
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/extractors/type_transformer.py (3)
2-2
: Update the copyright yearHey! I noticed the copyright year is set to 2025. Should we update it to 2024 since that's when this code was written? wdyt? 🤔
- # Copyright (c) 2025 Airbyte, Inc., all rights reserved. + # Copyright (c) 2024 Airbyte, Inc., all rights reserved.
12-33
: Enhance the docstring with examplesThe docstring is great! Would you mind if we added a concrete example to make it even more helpful for developers? Also, since we're using @DataClass but not defining any fields, maybe we could clarify why? Here's what I'm thinking:
""" Abstract base class for implementing type transformation logic. This class provides a blueprint for defining custom transformations on data records based on a provided schema. Implementing classes must override the `transform` method to specify the transformation logic. Attributes: None explicitly defined, as this is a dataclass intended to be subclassed. Methods: transform(record: Dict[str, Any], schema: Mapping[str, Any]) -> None: Abstract method that must be implemented by subclasses. It performs a transformation on a given data record based on the provided schema. Usage: To use this class, create a subclass that implements the `transform` method with the desired transformation logic. + + Example: + ```python + @dataclass + class MyTransformer(TypeTransformer): + def transform(self, record: Dict[str, Any], schema: Mapping[str, Any]) -> None: + # Transform string fields to uppercase + for field, value in record.items(): + if schema.get(field, {}).get("type") == "string": + record[field] = value.upper() + ``` + + Note: + This class is marked as a dataclass to maintain consistency with other + components in the declarative framework, allowing for future extension + with configuration fields if needed. """What do you think about these additions? 🤔
41-55
: Clarify in-place modification in docstringThe method docstring looks good! Would it be helpful to explicitly mention that the transformation happens in-place on the record? Something like:
""" - Perform a transformation on a data record based on a given schema. + Perform an in-place transformation on a data record based on a given schema. Args: record (Dict[str, Any]): The data record to be transformed. schema (Mapping[str, Any]): The schema that dictates how the record should be transformed. Returns: - None + None: The record is modified in-place. Raises: NotImplementedError: If the method is not implemented by a subclass. """What are your thoughts on making this more explicit? 🤔
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/extractors/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/record_selector.py
(2 hunks)airbyte_cdk/sources/declarative/extractors/type_transformer.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/extractors/init.py
- airbyte_cdk/sources/declarative/extractors/record_selector.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/extractors/type_transformer.py (1)
10-55
: Solid implementation! 👍Really nice work on this abstract base class! The implementation is clean, well-documented, and provides a solid foundation for custom type transformers. The use of dataclass, abstract method, and type hints makes it very developer-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
688-688
: Consider using a more generic example for the class_name?The current example is specific to Amazon Seller Partner. For better documentation, we could use a more generic example like
source_<name>.components.CustomTypeTransformer
, wdyt?- - "source_amazon_seller_partner.components.LedgerDetailedViewReportsTypeTransformer" + - "source_<name>.components.CustomTypeTransformer"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2625-2629
: LGTM! Clean integration with existing schema normalization.The schema changes for
RecordSelector
are well-structured, maintaining backward compatibility while adding support for custom normalization.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
100-102
: LGTM! Well-organized imports.The new imports are properly organized and follow the existing pattern.
Also applies to: 187-189, 316-318
453-456
: LGTM! Clear and concise mapping.The schema transformer mapping is straightforward and handles both normalization cases appropriately.
2031-2034
: LGTM! Implementation aligns with the discussed requirements.The schema normalization handling correctly implements both standard and custom normalization cases, following the pattern discussed in the previous reviews.
What
CustomSchemaNormalization
to declarative schemaSummary by CodeRabbit
New Features
Improvements
RecordSelector
to support custom normalization approaches.Technical Updates
TypeTransformer
available for external use in the module interface.