-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: (CDK) (ConnectorBuilder) - Add auxiliary requests
to slice; support TestRead
for AsyncRetriever (part 1/2)
#355
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis pull request reintroduces and updates classes in the connector builder models by adding new attributes to auxiliary request handling. Additionally, it enhances asynchronous auxiliary request processing across test reader files, updates type aliases and constants, refactors the asynchronous retriever creation logic, and improves HTTP logging by introducing an explicit type parameter. Unit tests have been updated accordingly to reflect these new output structures. Changes
Sequence Diagram(s)sequenceDiagram
participant M as Message
participant G as Message Grouper
participant A as is_async_auxiliary_request
participant H as handle_current_slice
M->>G: Pass incoming JSON message
G->>A: Check if message is async auxiliary request
A-->>G: Return true/false
alt Asynchronous Request
G->>G: Append to slice_auxiliary_requests
else Non-Asynchronous
G->>G: Yield log message immediately
end
G->>H: Process current slice with auxiliary_requests
H-->>G: Return completed slice
sequenceDiagram
participant C as Client
participant F as ModelToComponentFactory
participant D as _get_download_retriever
C->>F: Invoke create_async_retriever(model, config, ...)
F->>D: Determine download retriever type based on flags
alt Test Read Enabled
D-->>F: Return SimpleRetrieverTestReadDecorator
else
D-->>F: Return SimpleRetriever
end
F->>C: Return AsyncRetriever instance
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
airbyte_cdk/connector_builder/models.py (1)
48-53
: Consider adding type hints for the auxiliary requests list elements, wdyt?The
auxiliary_requests
field could benefit from explicit type hints for better code clarity:- auxiliary_requests: Optional[List[AuxiliaryRequest]] = None + auxiliary_requests: Optional[List["AuxiliaryRequest"]] = Noneairbyte_cdk/sources/http_logger.py (1)
20-26
: Consider using a type enum for request types, wdyt?To ensure type safety and prevent typos, we could use an enum for the request types:
from enum import Enum class RequestType(Enum): HTTP = "HTTP" ASYNC_CREATE = "ASYNC_CREATE" ASYNC_POLL = "ASYNC_POLL" ASYNC_ABORT = "ASYNC_ABORT" ASYNC_DELETE = "ASYNC_DELETE"airbyte_cdk/connector_builder/test_reader/types.py (1)
78-83
: Consider converting the request types list to an enum, wdyt?Converting
ASYNC_AUXILIARY_REQUEST_TYPES
to an enum would provide better type safety and IDE support:-ASYNC_AUXILIARY_REQUEST_TYPES = [ - "ASYNC_CREATE", - "ASYNC_POLL", - "ASYNC_ABORT", - "ASYNC_DELETE", -] +from enum import Enum + +class AsyncAuxiliaryRequestType(Enum): + CREATE = "ASYNC_CREATE" + POLL = "ASYNC_POLL" + ABORT = "ASYNC_ABORT" + DELETE = "ASYNC_DELETE" + +ASYNC_AUXILIARY_REQUEST_TYPES = [t.value for t in AsyncAuxiliaryRequestType]airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)
135-142
: Consider adding error handling for invalid request types, wdyt?The auxiliary request handling could benefit from error handling:
if auxiliary_request: + if not isinstance(auxiliary_request.type, str): + raise ValueError(f"Invalid auxiliary request type: {auxiliary_request.type}") if is_async_auxiliary_request(auxiliary_request): slice_auxiliary_requests.append(auxiliary_request) else: yield auxiliary_requestairbyte_cdk/connector_builder/test_reader/helpers.py (1)
417-440
: Consider adding validation for auxiliary requests.The function accepts auxiliary requests without validation. Should we add type checking to ensure the list contains only
AuxiliaryRequest
objects? wdyt?def handle_current_slice( current_slice_pages: List[StreamReadPages], current_slice_descriptor: Optional[Dict[str, Any]] = None, latest_state_message: Optional[Dict[str, Any]] = None, auxiliary_requests: Optional[List[AuxiliaryRequest]] = None, ) -> StreamReadSlices: + if auxiliary_requests is not None: + for request in auxiliary_requests: + if not isinstance(request, AuxiliaryRequest): + raise TypeError(f"Expected AuxiliaryRequest, got {type(request)}") return StreamReadSlices( pages=current_slice_pages, slice_descriptor=current_slice_descriptor, state=[latest_state_message] if latest_state_message else [], auxiliary_requests=auxiliary_requests if auxiliary_requests else [], )airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
630-670
: Would you consider adding type hints to improve maintainability?The extracted function looks good! For better maintainability and IDE support, we could add type hints. Here's a suggestion, wdyt?
- def _get_download_retriever() -> SimpleRetrieverTestReadDecorator | SimpleRetriever: + def _get_download_retriever( + download_requester: HttpRequester, + download_extractor: DpathExtractor | ResponseToFileExtractor, + transformations: List[RecordTransformation], + decoder: Decoder, + job_download_components_name: str, + ) -> SimpleRetrieverTestReadDecorator | SimpleRetriever:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
airbyte_cdk/connector_builder/models.py
(1 hunks)airbyte_cdk/connector_builder/test_reader/helpers.py
(9 hunks)airbyte_cdk/connector_builder/test_reader/message_grouper.py
(6 hunks)airbyte_cdk/connector_builder/test_reader/types.py
(1 hunks)airbyte_cdk/sources/declarative/auth/token_provider.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(2 hunks)airbyte_cdk/sources/http_logger.py
(1 hunks)airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
(1 hunks)unit_tests/connector_builder/test_connector_builder_handler.py
(1 hunks)unit_tests/sources/test_http_logger.py
(7 hunks)
✅ Files skipped from review due to trivial changes (2)
- airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
- airbyte_cdk/sources/declarative/auth/token_provider.py
🔇 Additional comments (9)
airbyte_cdk/connector_builder/models.py (1)
33-38
: LGTM! The addition of thetype
field enhances request tracking.The new
type
field inAuxiliaryRequest
will help distinguish between different types of auxiliary requests, particularly for async operations. This aligns well with the PR objectives.airbyte_cdk/sources/http_logger.py (1)
12-19
: LGTM! The optional type parameter maintains backward compatibility.The addition of the optional
type
parameter with a default value ensures existing code continues to work while allowing for more detailed logging.airbyte_cdk/connector_builder/test_reader/types.py (1)
70-76
: LGTM! The separation of message types improves type safety.The clear separation between
AuxiliaryRequest
andAirbyteLogMessage
in the tuple type makes the code more type-safe and easier to understand.airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)
94-94
: LGTM! The slice_auxiliary_requests list is properly initialized.The list initialization is placed correctly at the start of message processing.
unit_tests/sources/test_http_logger.py (1)
68-68
: LGTM! Consistent test data structure updates.The addition of
"type": "HTTP"
to the expected output structure is consistent across all test cases and aligns with the PR objectives.Also applies to: 89-89, 114-114, 135-135, 160-160, 189-189, 217-217
airbyte_cdk/connector_builder/test_reader/helpers.py (2)
334-335
: LGTM! Clear and focused async check function.The function provides a simple and effective way to check if an auxiliary request is asynchronous by comparing against predefined types.
603-689
: LGTM! Well-structured helper functions with proper validation.The new helper functions are well-organized, properly validate their inputs, and provide clear error messages. The type hints and docstrings make the code self-documenting.
unit_tests/connector_builder/test_connector_builder_handler.py (1)
540-540
: LGTM! Consistent test data structure update.The addition of
auxiliary_requests
field to the expected output structure aligns with the changes in other files.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2616-2794
: Nice refactoring of the async retriever creation!The extraction of
_get_download_retriever
improves code organization and readability while maintaining the existing functionality. This change aligns well with the PR objectives to support auxiliary requests in the connector builder.
auxiliary requests
to slice; support TestRead
for AsyncRetrieverauxiliary requests
to slice; support TestRead
for AsyncRetriever (part 1/2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 question! but nothing blocking. LGTM!
@@ -396,6 +396,7 @@ def _log_response(self, response: requests.Response) -> None: | |||
"Obtains access token", | |||
self._NO_STREAM_NAME, | |||
is_auxiliary=True, | |||
type="AUTH", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Given that we have separate types for the 3 async
requests. Should we have separate types for the different oauth requests? get token vs refresh token?
…el-auxiliary-requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
2632-2641
: Consider adding a docstring for_get_download_retriever()
?This helper function nicely encapsulates the download retriever creation logic, but it lacks a brief summary of its purpose. Adding a short docstring might improve clarity for future contributors. wdyt?
2662-2670
: Preserve or pass through aprimary_key
for the job download retriever?Here, we always set
primary_key=None
for the download retriever. If we need consistent record identification across creation, polling, and download stages, preserving the parent stream’sprimary_key
might be useful. Would you like to allow passing in a configurableprimary_key
instead of hardcodingNone
? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2649-2661
: Confirm behavior when_limit_slices_fetched
is zero or negative?We see
maximum_number_of_slices = self._limit_slices_fetched or 5
. If_limit_slices_fetched
is zero or negative, the function defaults to 5, which might be intentional or might surprise users. Would you like to validate_limit_slices_fetched
or update the default logic to handle edge cases differently? wdyt?
What
Resolving:
How
This PR introduces several enhancements and refactorings:
New Features:
auxiliary_requests
toStreamReadSlices
to include auxiliary requests along with pages and states.Refactor
Details
User Impact
No impact is expected, this is not a Breaking Change
Loom Demo
Before:
auxiliary_requests
under theslices
After:
auxiliary_requests
are now visible under theslices
for each slice processed.pages
hold the result of the processed result dataurl
forCOMPLETED
async job.Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor