-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Rewind seekable streams before retrying #821
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally prefer if we were rewinding stream only when we need to. @pietern WDYT?
@ksafonov-db I doubt it matters much. Regardless of the outcome of the operation, the caller will have to seek to a known location anyway if they want to keep the handle open in the first place. Performance impact on failure is negligible because seek only updates the location offset in the file handle and doesn't trigger I/O directly (not counting readahead that the OS might do, but we can ignore that here). I'll leave it to you to figure out if there is a case for pushing this down into a "pre-run" callback from a structural pov. |
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
Test Details: go/deco-tests/11844787622 |
Confirmed with @ksafonov-db offline that he is good with the current state of the PR |
### New Features and Improvements * Read streams by 1MB chunks by default. ([#817](#817)). ### Bug Fixes * Rewind seekable streams before retrying ([#821](#821)). ### Internal Changes * Reformat SDK with YAPF 0.43. ([#822](#822)). * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)). * Update PR template ([#814](#814)). ### API Changes: * Added `databricks.sdk.service.apps`, `databricks.sdk.service.billing`, `databricks.sdk.service.catalog`, `databricks.sdk.service.compute`, `databricks.sdk.service.dashboards`, `databricks.sdk.service.files`, `databricks.sdk.service.iam`, `databricks.sdk.service.jobs`, `databricks.sdk.service.marketplace`, `databricks.sdk.service.ml`, `databricks.sdk.service.oauth2`, `databricks.sdk.service.pipelines`, `databricks.sdk.service.provisioning`, `databricks.sdk.service.serving`, `databricks.sdk.service.settings`, `databricks.sdk.service.sharing`, `databricks.sdk.service.sql`, `databricks.sdk.service.vectorsearch` and `databricks.sdk.service.workspace` packages. OpenAPI SHA: 2035bf5234753adfd080a79bff325dd4a5b90bc2, Date: 2024-11-15
### New Features and Improvements * Read streams by 1MB chunks by default. ([#817](#817)). ### Bug Fixes * Rewind seekable streams before retrying ([#821](#821)). * Properly serialize nested data classes. ### Internal Changes * Reformat SDK with YAPF 0.43. ([#822](#822)). * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)). ### API Changes: * Added `service_principal_client_id` field for `databricks.sdk.service.apps.App`. * Added `azure_service_principal`, `gcp_service_account_key` and `read_only` fields for `databricks.sdk.service.catalog.CreateCredentialRequest`. * Added `azure_service_principal`, `read_only` and `used_for_managed_storage` fields for `databricks.sdk.service.catalog.CredentialInfo`. * Added `omit_username` field for `databricks.sdk.service.catalog.ListTablesRequest`. * Added `azure_service_principal` and `read_only` fields for `databricks.sdk.service.catalog.UpdateCredentialRequest`. * Added `external_location_name`, `read_only` and `url` fields for `databricks.sdk.service.catalog.ValidateCredentialRequest`. * Added `is_dir` field for `databricks.sdk.service.catalog.ValidateCredentialResponse`. * Added `only` field for `databricks.sdk.service.jobs.RunNow`. * Added `restart_window` field for `databricks.sdk.service.pipelines.CreatePipeline`. * Added `restart_window` field for `databricks.sdk.service.pipelines.EditPipeline`. * Added `restart_window` field for `databricks.sdk.service.pipelines.PipelineSpec`. * Added `private_access_settings_id` field for `databricks.sdk.service.provisioning.UpdateWorkspaceRequest`. * Changed `create_credential()` and `generate_temporary_service_credential()` methods for [w.credentials](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/credentials.html) workspace-level service with new required argument order. * Changed `access_connector_id` field for `databricks.sdk.service.catalog.AzureManagedIdentity` to be required. * Changed `access_connector_id` field for `databricks.sdk.service.catalog.AzureManagedIdentity` to be required. * Changed `name` field for `databricks.sdk.service.catalog.CreateCredentialRequest` to be required. * Changed `credential_name` field for `databricks.sdk.service.catalog.GenerateTemporaryServiceCredentialRequest` to be required. OpenAPI SHA: f2385add116e3716c8a90a0b68e204deb40f996c, Date: 2024-11-15
What changes are proposed in this pull request?
This PR adapts the retry mechanism of
BaseClient
to only retry if (i) the request is not a stream or (ii) the stream is seekable and can be reset to its initial position. This fixes a bug that led retries to ignore part of the request that were already processed in previous attempts.How is this tested?
Added unit tests to verify that (i) non-seekable streams are not retried, and (ii) seekable streams are properly reset before retrying.