Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: (CDK) (ConnectorBuilder) - Add auxiliary requests to slice; support TestRead for AsyncRetriever (part 1/2) #355

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bazarnov
Copy link
Contributor

@bazarnov bazarnov commented Feb 20, 2025

What

Resolving:

How

This PR introduces several enhancements and refactorings:

  • New Features:

    • Improved handling of async requests with enriched metadata in stream slices.
    • Added auxiliary_requests to StreamReadSlices to include auxiliary requests along with pages and states.
    • Upgraded logging for HTTP and authentication events, now clearly identifying request types.
    • Introduced new utility functions for extracting and processing message components.
  • Refactor

    • Message Handling and Grouping: Streamlined handling and grouping of auxiliary and log messages for more consistent outputs.
    • Data Structure Refinement: Refined data structures to better support enhanced message processing.
      Details
    • Modified existing functions to support the new auxiliary request handling.
    • Updated test cases to cover the new functionalities and changes.

User Impact

No impact is expected, this is not a Breaking Change

Loom Demo

Before:

  • no auxiliary_requests under the slices
{
    "logs": [],
    "slices": [
        {
            "pages": [
                {
                    "records": [
                        {"example_record_csv": "value"},
                        {"more_record_examples_from_csv": "more_values"}
                    ],
                }
            ],
            "slice_descriptor": {},
            "state": []
        }
    ],
    "test_read_limit_reached": false,
    "auxiliary_requests": [],
    "inferred_schema": {},
    "inferred_datetime_formats": {}
}

After:

  • the auxiliary_requests are now visible under the slices for each slice processed.
  • the pages hold the result of the processed result data url for COMPLETED async job.
{
    "logs": [],
    "slices": [
        {
            "pages": [
                {
                    "records": [
                        {"example_record_csv": "value"},
                        {"more_record_examples_from_csv": "more_values"}
                    ],
                    "request": {
                        "url": "https://some_link_to_download_the_file",
                        "headers": {
                            "User-Agent": "python-requests/2.32.3",
                            "Accept-Encoding": "gzip, deflate",
                            "Accept": "*/*",
                            "Connection": "keep-alive"
                        },
                        "http_method": "GET"
                    },
                    "response": {
                        "status": 200,
                        "body": "data_goes_here!",
                        "headers": {
                            "Content-Disposition": "attachment;filename=20207253_bb2314f1-b9ab-4876-ac48-366ae706a31e.csv",
                            "Content-Encoding": "x-gzip",
                            "Accept-Ranges": "bytes",
                            "Content-Type": "binary/octet-stream",
                            "Content-Length": "1155",
                            "Server": "AmazonS3"
                        }
                    }
                }
            ],
            "slice_descriptor": {},
            "state": [],
            "auxiliary_requests": [
                {
                    "description": "Create the server-side async job.",
                    "request": {
                        "url": "https://api.sendgrid.com/v3/marketing/contacts/exports",
                        "headers": {
                            "User-Agent": "python-requests/2.32.3",
                            "Accept-Encoding": "gzip, deflate",
                            "Accept": "*/*",
                            "Connection": "keep-alive",
                            "Authorization": "Bearer ****",
                            "Content-Length": "0"
                        },
                        "http_method": "POST"
                    },
                    "response": {
                        "status": 202,
                        "body": "{\"id\":\"bb2314f1-b9ab-4876-ac48-366ae706a31e\",\"_metadata\":{\"self\":\"https://api.sendgrid.com/v3/mc/contacts/exports/bb2314f1-b9ab-4876-ac48-366ae706a31e\"}}\n",
                        "headers": {
                            "Server": "nginx",
                            "Date": "Thu, 20 Feb 2025 14:04:42 GMT",
                            "Content-Type": "application/json",
                            "Content-Length": "154",
                            "Connection": "keep-alive",
                            "x-amzn-requestid": "23799445-d20b-4de6-86b2-249d06b9581c",
                            "x-amzn-remapped-x-amzn-requestid": "100cea76-3360-41f1-a135-2ce3b620c19e",
                            "access-control-allow-origin": "*",
                            "access-control-allow-headers": "AUTHORIZATION, Content-Type, On-behalf-of, x-sg-elas-acl, X-Recaptcha, X-Request-Source, x-sendgrid-mc-version",
                            "x-amzn-remapped-content-length": "154",
                            "x-sendgrid-pod-id": "0",
                            "x-amz-apigw-id": "GSVzLETMIAMEq_A=",
                            "access-control-allow-methods": "POST,OPTIONS,GET,OPTIONS",
                            "access-control-expose-headers": "Link, Location",
                            "x-amzn-trace-id": "Root=1-67b7367a-106f6f672455aaf1609e82ae;Parent=628aedad0e2144f7;Sampled=0;Lineage=1:38c8337a:0",
                            "x-amzn-remapped-date": "Thu, 20 Feb 2025 14:04:42 GMT",
                            "x-sendgrid-podrouter-region": "us-east-1",
                            "x-envoy-upstream-service-time": "404",
                            "strict-transport-security": "max-age=31536000; includeSubDomains",
                            "content-security-policy": "frame-ancestors 'none'",
                            "cache-control": "no-store",
                            "referrer-policy": "strict-origin-when-cross-origin",
                            "x-content-type-options": "nosniff",
                            "x-ratelimit-remaining": "9",
                            "x-ratelimit-reset": "18",
                            "x-ratelimit-limit": "10"
                        }
                    },
                    "title": "Async Job -- Create",
                    "type": "ASYNC_CREATE"
                },
                {
                    "description": "Poll the status of the server-side async job.",
                    "request": {
                        "url": "https://api.sendgrid.com/v3/marketing/contacts/exports/bb2314f1-b9ab-4876-ac48-366ae706a31e",
                        "headers": {
                            "User-Agent": "python-requests/2.32.3",
                            "Accept-Encoding": "gzip, deflate",
                            "Accept": "*/*",
                            "Connection": "keep-alive",
                            "Authorization": "Bearer ****"
                        },
                        "http_method": "GET"
                    },
                    "response": {
                        "status": 200,
                        "body": "{\"id\":\"bb2314f1-b9ab-4876-ac48-366ae706a31e\",\"status\":\"pending\",\"created_at\":\"2025-02-20T14:04:42Z\",\"updated_at\":\"2025-02-20T14:04:42Z\",\"completed_at\":\"1970-01-01T00:00:00Z\",\"expires_at\":\"2025-02-21T02:04:42Z\",\"urls\":[],\"user_id\":20207253,\"export_type\":\"contacts_export\",\"segments\":null,\"lists\":null,\"_metadata\":{\"self\":\"https://api.sendgrid.com/v3/mc/contacts/export/bb2314f1-b9ab-4876-ac48-366ae706a31e\"}}\n",
                        "headers": {
                            "Server": "nginx",
                            "Date": "Thu, 20 Feb 2025 14:04:43 GMT",
                            "Content-Type": "application/json",
                            "Content-Length": "408",
                            "Connection": "keep-alive",
                            "x-amzn-requestid": "93aefdb7-0cf4-469c-a774-ca61613fd739",
                            "x-amzn-remapped-x-amzn-requestid": "4a2ee46d-a952-4d68-a393-e61157a1bd86",
                            "access-control-allow-origin": "*",
                            "access-control-allow-headers": "AUTHORIZATION, Content-Type, On-behalf-of, x-sg-elas-acl, X-Recaptcha, X-Request-Source, x-sendgrid-mc-version",
                            "x-amzn-remapped-content-length": "408",
                            "x-sendgrid-pod-id": "0",
                            "x-amz-apigw-id": "GSVzTFRmoAMEgWQ=",
                            "access-control-allow-methods": "GET,OPTIONS,DELETE,OPTIONS",
                            "access-control-expose-headers": "Link, Location",
                            "x-amzn-trace-id": "Root=1-67b7367b-1700fd882e97ca4d02ec0185;Parent=5955e8ff7f1beb9a;Sampled=0;Lineage=1:38c8337a:0",
                            "x-amzn-remapped-date": "Thu, 20 Feb 2025 14:04:43 GMT",
                            "x-sendgrid-podrouter-region": "us-east-1",
                            "x-envoy-upstream-service-time": "261",
                            "strict-transport-security": "max-age=31536000; includeSubDomains",
                            "content-security-policy": "frame-ancestors 'none'",
                            "cache-control": "no-store",
                            "referrer-policy": "strict-origin-when-cross-origin",
                            "x-content-type-options": "nosniff",
                            "x-ratelimit-remaining": "599",
                            "x-ratelimit-reset": "17",
                            "x-ratelimit-limit": "600",
                            "Powered-By": "SGGateway"
                        }
                    },
                    "title": "Async Job -- Polling",
                    "type": "ASYNC_POLL"
                },
                {
                    "description": "Poll the status of the server-side async job.",
                    "request": {
                        "url": "https://api.sendgrid.com/v3/marketing/contacts/exports/bb2314f1-b9ab-4876-ac48-366ae706a31e",
                        "headers": {
                            "User-Agent": "python-requests/2.32.3",
                            "Accept-Encoding": "gzip, deflate",
                            "Accept": "*/*",
                            "Connection": "keep-alive",
                            "Authorization": "Bearer ****"
                        },
                        "http_method": "GET"
                    },
                    "response": {
                        "status": 200,
                        "body": "{\"id\":\"bb2314f1-b9ab-4876-ac48-366ae706a31e\",\"status\":\"ready\",\"contact_count\":24,\"created_at\":\"2025-02-20T14:04:42Z\",\"updated_at\":\"2025-02-20T14:04:44Z\",\"completed_at\":\"2025-02-20T14:04:44Z\",\"expires_at\":\"2025-02-21T02:04:42Z\",\"urls\":[\"https://mcd-contacts-export-productionprod.s3.us-west-2.amazonaws.com/20207253_bb2314f1-b9ab-4876-ac48-366ae706a31e.csv.gzip?X-Amz-Algorithm=AWS4-HMAC-SHA256\\u0026X-Amz-Credential=ASIAVCSGC5XH7P4VC2UA%2F20250220%2Fus-west-2%2Fs3%2Faws4_request\\u0026X-Amz-Date=20250220T140448Z\\u0026X-Amz-Expires=300\\u0026X-Amz-Security-Token=IQoJb3JpZ2luX2VjEJb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCIHicrFFd3AVXCGe%2B28ciRyjB6bzMkyKx39KXt8HYV04kAiAeH%2Bvs1CkOSvJvdTUFVfKhKZV7OfeSiFnr78GxxVx3GyqWAwi%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAMaDDM0OTExMzA4NTM5MSIM3t%2BBytUYYnGeQC6CKuoC3oRJNrlsHOY5LHKS%2FMtBGxWjC1M57NDeJILY0wg5%2BWo7TbKkul0qbrR3mGI5m3PHjtDRG3rMr0w7KKhqYLc5VBJ7l4YxDxBFo8XqMzJ2TXSXE%2FxQdxSDx5d0jbId0qMILqlnjfWvokHD0YSWaOZcPC3hWhJhRMGkJA5p%2F2ZErgHpbFv58DuLsssFTxCVHH2%2BhIKrgBXBySLIjkA3ykBHSzkRJ2LAlz13AWAeOZzR7tBHwdMYZBNNY7n49CbLIxUZxqSY4nVZ%2BkTPS7MNEmj%2FNYBiO5LwXiSSpZHuu7Squu4g13a%2FolJY8xoADYxMzh2jadpiopOz1qcTrsQ1TzU6AMyW6M3TXMWZ92cKaVrBJJqciqbCtSYWg6rYuo5Zgo26jNirRuNOsy%2FfcQN%2FENM9zxMywB8TKXlyYx9LJbTImxdTEE4gcAPRzjtg3zuodTNIDElqI46RR432V8FrhKxtgkEhHFddYY6cx30wq%2BPcvQY6ngFneLxr5sTHzOsj53Mu9cddLjdAnRHoR1tZddoSFxQMpmOW7ewu3mL9SVq7Mci%2FyDmV9LSuIAaxdZqEt6hc6mCuoeEtxqP3aFy4il46ETJjtpc2YrNRVyM8jVaGjlqDd1C%2FLbv00Lqta9ka2WhtMle0HmNCVXgLYzUb4evrZVUfuJDQSxxD4oL%2BrNB8pS4k7%2BO4P9Smf6Rl%2Bqo%2FFO%2FCGA%3D%3D\\u0026X-Amz-SignedHeaders=host\\u0026response-content-disposition=attachment%3Bfilename%3D20207253_bb2314f1-b9ab-4876-ac48-366ae706a31e.csv\\u0026response-content-encoding=x-gzip\\u0026X-Amz-Signature=b7c4ebb020d3edd66ea863c0da34074f8ecddcc409afaa09d346ea86f1cef0dd\"],\"user_id\":20207253,\"export_type\":\"contacts_export\",\"segments\":null,\"lists\":null,\"_metadata\":{\"self\":\"https://api.sendgrid.com/v3/mc/contacts/export/bb2314f1-b9ab-4876-ac48-366ae706a31e\"}}\n",
                        "headers": {
                            "Server": "nginx",
                            "Date": "Thu, 20 Feb 2025 14:04:48 GMT",
                            "Content-Type": "application/json",
                            "Transfer-Encoding": "chunked",
                            "Connection": "keep-alive",
                            "x-amzn-requestid": "0ed22ae5-3206-4112-a5ad-ed4adc5ac161",
                            "x-amzn-remapped-x-amzn-requestid": "b04913cc-91b2-4a3e-b902-d8de1e55b212",
                            "access-control-allow-origin": "*",
                            "access-control-allow-headers": "AUTHORIZATION, Content-Type, On-behalf-of, x-sg-elas-acl, X-Recaptcha, X-Request-Source, x-sendgrid-mc-version",
                            "x-amzn-remapped-content-length": "2015",
                            "x-sendgrid-pod-id": "0",
                            "x-amz-apigw-id": "GSV0JEHioAMEDSA=",
                            "access-control-allow-methods": "GET,OPTIONS,DELETE,OPTIONS",
                            "access-control-expose-headers": "Link, Location",
                            "x-amzn-trace-id": "Root=1-67b73680-5dab1b7024963bed5e464722;Parent=6f8ca9ea28e8ca22;Sampled=0;Lineage=1:38c8337a:0",
                            "x-amzn-remapped-date": "Thu, 20 Feb 2025 14:04:48 GMT",
                            "x-sendgrid-podrouter-region": "us-east-1",
                            "x-envoy-upstream-service-time": "237",
                            "strict-transport-security": "max-age=31536000; includeSubDomains",
                            "content-security-policy": "frame-ancestors 'none'",
                            "cache-control": "no-store",
                            "referrer-policy": "strict-origin-when-cross-origin",
                            "x-content-type-options": "nosniff",
                            "x-ratelimit-remaining": "599",
                            "x-ratelimit-reset": "12",
                            "x-ratelimit-limit": "600",
                            "Powered-By": "SGGateway",
                            "Content-Encoding": "gzip"
                        }
                    },
                    "title": "Async Job -- Polling",
                    "type": "ASYNC_POLL"
                }
            ]
        }
    ],
    "test_read_limit_reached": false,
    "auxiliary_requests": [],
    "inferred_schema": {},
    "inferred_datetime_formats": {}
}

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Improved asynchronous request processing with enriched metadata in stream slices.
    • Upgraded logging for HTTP and authentication events, now clearly identifying request types.
  • Refactor

    • Streamlined handling and grouping of auxiliary and log messages for more consistent outputs.
    • Refined data structures to better support enhanced message processing.
    • Enhanced code organization and readability in the retrieval processes.

@bazarnov bazarnov self-assigned this Feb 20, 2025
@github-actions github-actions bot added bug Something isn't working security labels Feb 20, 2025
@bazarnov bazarnov marked this pull request as ready for review February 20, 2025 14:37
Copy link
Contributor

coderabbitai bot commented Feb 20, 2025

📝 Walkthrough

Walkthrough

This pull request reintroduces and updates classes in the connector builder models by adding new attributes to auxiliary request handling. Additionally, it enhances asynchronous auxiliary request processing across test reader files, updates type aliases and constants, refactors the asynchronous retriever creation logic, and improves HTTP logging by introducing an explicit type parameter. Unit tests have been updated accordingly to reflect these new output structures.

Changes

File(s) Change Summary
airbyte_cdk/.../connector_builder/models.py Re-added StreamReadPages and StreamReadSlices classes; updated StreamReadSlices with an optional auxiliary_requests attribute; modified AuxiliaryRequest to include a new required type attribute.
airbyte_cdk/.../test_reader/{helpers.py, message_grouper.py, types.py} Introduced is_async_auxiliary_request; updated handle_current_slice to accept auxiliary requests; refactored handle_auxiliary_request using new helper functions; adjusted message grouping logic; modified LOG_MESSAGES_OUTPUT_TYPE alias and added ASYNC_AUXILIARY_REQUEST_TYPES constant.
airbyte_cdk/.../{auth/token_provider.py, http_logger.py, streams/http/requests_native_auth/abstract_oauth.py} Enhanced logging by adding a type parameter (defaulting to "AUTH" in auth modules and "HTTP" in HTTP logging), improving log categorization.
airbyte_cdk/.../parsers/model_to_component_factory.py Refactored create_async_retriever by introducing the inner function _get_download_retriever to encapsulate download retriever creation; updated the method signature to return an AsyncRetriever.
unit_tests/.../{test_connector_builder_handler.py, test_http_logger.py} Updated expected test outputs to explicitly include the auxiliary_requests field in slices and a new "type": "HTTP" key in HTTP message logs.

Sequence Diagram(s)

sequenceDiagram
    participant M as Message
    participant G as Message Grouper
    participant A as is_async_auxiliary_request
    participant H as handle_current_slice

    M->>G: Pass incoming JSON message
    G->>A: Check if message is async auxiliary request
    A-->>G: Return true/false
    alt Asynchronous Request
      G->>G: Append to slice_auxiliary_requests
    else Non-Asynchronous
      G->>G: Yield log message immediately
    end
    G->>H: Process current slice with auxiliary_requests
    H-->>G: Return completed slice
Loading
sequenceDiagram
    participant C as Client
    participant F as ModelToComponentFactory
    participant D as _get_download_retriever

    C->>F: Invoke create_async_retriever(model, config, ...)
    F->>D: Determine download retriever type based on flags
    alt Test Read Enabled
      D-->>F: Return SimpleRetrieverTestReadDecorator
    else
      D-->>F: Return SimpleRetriever
    end
    F->>C: Return AsyncRetriever instance
Loading

Suggested labels

enhancement

Suggested reviewers

  • lazebnyi
  • maxi297
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
airbyte_cdk/connector_builder/models.py (1)

48-53: Consider adding type hints for the auxiliary requests list elements, wdyt?

The auxiliary_requests field could benefit from explicit type hints for better code clarity:

-    auxiliary_requests: Optional[List[AuxiliaryRequest]] = None
+    auxiliary_requests: Optional[List["AuxiliaryRequest"]] = None
airbyte_cdk/sources/http_logger.py (1)

20-26: Consider using a type enum for request types, wdyt?

To ensure type safety and prevent typos, we could use an enum for the request types:

from enum import Enum

class RequestType(Enum):
    HTTP = "HTTP"
    ASYNC_CREATE = "ASYNC_CREATE"
    ASYNC_POLL = "ASYNC_POLL"
    ASYNC_ABORT = "ASYNC_ABORT"
    ASYNC_DELETE = "ASYNC_DELETE"
airbyte_cdk/connector_builder/test_reader/types.py (1)

78-83: Consider converting the request types list to an enum, wdyt?

Converting ASYNC_AUXILIARY_REQUEST_TYPES to an enum would provide better type safety and IDE support:

-ASYNC_AUXILIARY_REQUEST_TYPES = [
-    "ASYNC_CREATE",
-    "ASYNC_POLL",
-    "ASYNC_ABORT",
-    "ASYNC_DELETE",
-]
+from enum import Enum
+
+class AsyncAuxiliaryRequestType(Enum):
+    CREATE = "ASYNC_CREATE"
+    POLL = "ASYNC_POLL"
+    ABORT = "ASYNC_ABORT"
+    DELETE = "ASYNC_DELETE"
+
+ASYNC_AUXILIARY_REQUEST_TYPES = [t.value for t in AsyncAuxiliaryRequestType]
airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)

135-142: Consider adding error handling for invalid request types, wdyt?

The auxiliary request handling could benefit from error handling:

     if auxiliary_request:
+        if not isinstance(auxiliary_request.type, str):
+            raise ValueError(f"Invalid auxiliary request type: {auxiliary_request.type}")
         if is_async_auxiliary_request(auxiliary_request):
             slice_auxiliary_requests.append(auxiliary_request)
         else:
             yield auxiliary_request
airbyte_cdk/connector_builder/test_reader/helpers.py (1)

417-440: Consider adding validation for auxiliary requests.

The function accepts auxiliary requests without validation. Should we add type checking to ensure the list contains only AuxiliaryRequest objects? wdyt?

 def handle_current_slice(
     current_slice_pages: List[StreamReadPages],
     current_slice_descriptor: Optional[Dict[str, Any]] = None,
     latest_state_message: Optional[Dict[str, Any]] = None,
     auxiliary_requests: Optional[List[AuxiliaryRequest]] = None,
 ) -> StreamReadSlices:
+    if auxiliary_requests is not None:
+        for request in auxiliary_requests:
+            if not isinstance(request, AuxiliaryRequest):
+                raise TypeError(f"Expected AuxiliaryRequest, got {type(request)}")
     return StreamReadSlices(
         pages=current_slice_pages,
         slice_descriptor=current_slice_descriptor,
         state=[latest_state_message] if latest_state_message else [],
         auxiliary_requests=auxiliary_requests if auxiliary_requests else [],
     )
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

630-670: Would you consider adding type hints to improve maintainability?

The extracted function looks good! For better maintainability and IDE support, we could add type hints. Here's a suggestion, wdyt?

-    def _get_download_retriever() -> SimpleRetrieverTestReadDecorator | SimpleRetriever:
+    def _get_download_retriever(
+        download_requester: HttpRequester,
+        download_extractor: DpathExtractor | ResponseToFileExtractor,
+        transformations: List[RecordTransformation],
+        decoder: Decoder,
+        job_download_components_name: str,
+    ) -> SimpleRetrieverTestReadDecorator | SimpleRetriever:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 978be1b and be03fe5.

📒 Files selected for processing (10)
  • airbyte_cdk/connector_builder/models.py (1 hunks)
  • airbyte_cdk/connector_builder/test_reader/helpers.py (9 hunks)
  • airbyte_cdk/connector_builder/test_reader/message_grouper.py (6 hunks)
  • airbyte_cdk/connector_builder/test_reader/types.py (1 hunks)
  • airbyte_cdk/sources/declarative/auth/token_provider.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2 hunks)
  • airbyte_cdk/sources/http_logger.py (1 hunks)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1 hunks)
  • unit_tests/connector_builder/test_connector_builder_handler.py (1 hunks)
  • unit_tests/sources/test_http_logger.py (7 hunks)
✅ Files skipped from review due to trivial changes (2)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
  • airbyte_cdk/sources/declarative/auth/token_provider.py
🔇 Additional comments (9)
airbyte_cdk/connector_builder/models.py (1)

33-38: LGTM! The addition of the type field enhances request tracking.

The new type field in AuxiliaryRequest will help distinguish between different types of auxiliary requests, particularly for async operations. This aligns well with the PR objectives.

airbyte_cdk/sources/http_logger.py (1)

12-19: LGTM! The optional type parameter maintains backward compatibility.

The addition of the optional type parameter with a default value ensures existing code continues to work while allowing for more detailed logging.

airbyte_cdk/connector_builder/test_reader/types.py (1)

70-76: LGTM! The separation of message types improves type safety.

The clear separation between AuxiliaryRequest and AirbyteLogMessage in the tuple type makes the code more type-safe and easier to understand.

airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)

94-94: LGTM! The slice_auxiliary_requests list is properly initialized.

The list initialization is placed correctly at the start of message processing.

unit_tests/sources/test_http_logger.py (1)

68-68: LGTM! Consistent test data structure updates.

The addition of "type": "HTTP" to the expected output structure is consistent across all test cases and aligns with the PR objectives.

Also applies to: 89-89, 114-114, 135-135, 160-160, 189-189, 217-217

airbyte_cdk/connector_builder/test_reader/helpers.py (2)

334-335: LGTM! Clear and focused async check function.

The function provides a simple and effective way to check if an auxiliary request is asynchronous by comparing against predefined types.


603-689: LGTM! Well-structured helper functions with proper validation.

The new helper functions are well-organized, properly validate their inputs, and provide clear error messages. The type hints and docstrings make the code self-documenting.

unit_tests/connector_builder/test_connector_builder_handler.py (1)

540-540: LGTM! Consistent test data structure update.

The addition of auxiliary_requests field to the expected output structure aligns with the changes in other files.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2616-2794: Nice refactoring of the async retriever creation!

The extraction of _get_download_retriever improves code organization and readability while maintaining the existing functionality. This change aligns well with the PR objectives to support auxiliary requests in the connector builder.

@bazarnov bazarnov changed the title fix: (CDK) (ConnectorBuilder) - Add auxiliary requests to slice; support TestRead for AsyncRetriever fix: (CDK) (ConnectorBuilder) - Add auxiliary requests to slice; support TestRead for AsyncRetriever (part 1/2) Feb 20, 2025
Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 question! but nothing blocking. LGTM!

@@ -396,6 +396,7 @@ def _log_response(self, response: requests.Response) -> None:
"Obtains access token",
self._NO_STREAM_NAME,
is_auxiliary=True,
type="AUTH",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Given that we have separate types for the 3 async requests. Should we have separate types for the different oauth requests? get token vs refresh token?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

2632-2641: Consider adding a docstring for _get_download_retriever()?

This helper function nicely encapsulates the download retriever creation logic, but it lacks a brief summary of its purpose. Adding a short docstring might improve clarity for future contributors. wdyt?


2662-2670: Preserve or pass through a primary_key for the job download retriever?

Here, we always set primary_key=None for the download retriever. If we need consistent record identification across creation, polling, and download stages, preserving the parent stream’s primary_key might be useful. Would you like to allow passing in a configurable primary_key instead of hardcoding None? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be03fe5 and 1aa1078.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2649-2661: Confirm behavior when _limit_slices_fetched is zero or negative?

We see maximum_number_of_slices = self._limit_slices_fetched or 5. If _limit_slices_fetched is zero or negative, the function defaults to 5, which might be intentional or might surprise users. Would you like to validate _limit_slices_fetched or update the default logic to handle edge cases differently? wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants