Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Rewind seekable streams before retrying #821

Merged
merged 12 commits into from
Nov 15, 2024

Conversation

renaudhartert-db
Copy link
Contributor

@renaudhartert-db renaudhartert-db commented Nov 13, 2024

What changes are proposed in this pull request?

This PR adapts the retry mechanism of BaseClient to only retry if (i) the request is not a stream or (ii) the stream is seekable and can be reset to its initial position. This fixes a bug that led retries to ignore part of the request that were already processed in previous attempts.

How is this tested?

Added unit tests to verify that (i) non-seekable streams are not retried, and (ii) seekable streams are properly reset before retrying.

databricks/sdk/_base_client.py Show resolved Hide resolved
databricks/sdk/_base_client.py Show resolved Hide resolved
databricks/sdk/_base_client.py Show resolved Hide resolved
databricks/sdk/_base_client.py Show resolved Hide resolved
tests/test_base_client.py Outdated Show resolved Hide resolved
tests/test_base_client.py Outdated Show resolved Hide resolved
tests/test_base_client.py Outdated Show resolved Hide resolved
databricks/sdk/_base_client.py Show resolved Hide resolved
Copy link
Contributor

@ksafonov-db ksafonov-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally prefer if we were rewinding stream only when we need to. @pietern WDYT?

databricks/sdk/_base_client.py Show resolved Hide resolved
tests/test_base_client.py Show resolved Hide resolved
tests/test_base_client.py Outdated Show resolved Hide resolved
@pietern
Copy link
Contributor

pietern commented Nov 14, 2024

@ksafonov-db I doubt it matters much. Regardless of the outcome of the operation, the caller will have to seek to a known location anyway if they want to keep the handle open in the first place.

Performance impact on failure is negligible because seek only updates the location offset in the file handle and doesn't trigger I/O directly (not counting readahead that the OS might do, but we can ignore that here).

I'll leave it to you to figure out if there is a case for pushing this down into a "pre-run" callback from a structural pov.

Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 821
  • Commit SHA: fd893eb99b32560b522d8bd0f8dc411944957a9a

Checks will be approved automatically on success.

@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/11844787622

@renaudhartert-db
Copy link
Contributor Author

Confirmed with @ksafonov-db offline that he is good with the current state of the PR

@renaudhartert-db renaudhartert-db added this pull request to the merge queue Nov 15, 2024
Merged via the queue into main with commit e8b7916 Nov 15, 2024
19 checks passed
@renaudhartert-db renaudhartert-db deleted the renaud.hartert/stream-reset branch November 15, 2024 13:04
renaudhartert-db added a commit that referenced this pull request Nov 18, 2024
### New Features and Improvements

 * Read streams by 1MB chunks by default. ([#817](#817)).

### Bug Fixes

 * Rewind seekable streams before retrying ([#821](#821)).

### Internal Changes

 * Reformat SDK with YAPF 0.43. ([#822](#822)).
 * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)).
 * Update PR template ([#814](#814)).

### API Changes:

 * Added `databricks.sdk.service.apps`, `databricks.sdk.service.billing`, `databricks.sdk.service.catalog`, `databricks.sdk.service.compute`, `databricks.sdk.service.dashboards`, `databricks.sdk.service.files`, `databricks.sdk.service.iam`, `databricks.sdk.service.jobs`, `databricks.sdk.service.marketplace`, `databricks.sdk.service.ml`, `databricks.sdk.service.oauth2`, `databricks.sdk.service.pipelines`, `databricks.sdk.service.provisioning`, `databricks.sdk.service.serving`, `databricks.sdk.service.settings`, `databricks.sdk.service.sharing`, `databricks.sdk.service.sql`, `databricks.sdk.service.vectorsearch` and `databricks.sdk.service.workspace` packages.

OpenAPI SHA: 2035bf5234753adfd080a79bff325dd4a5b90bc2, Date: 2024-11-15
This was referenced Nov 18, 2024
github-merge-queue bot pushed a commit that referenced this pull request Nov 18, 2024
### New Features and Improvements

* Read streams by 1MB chunks by default.
([#817](#817)).

### Bug Fixes

* Rewind seekable streams before retrying
([#821](#821)).
 * Properly serialize nested data classes. 

### Internal Changes

* Reformat SDK with YAPF 0.43.
([#822](#822)).
* Update Jobs GetRun API to support paginated responses for jobs and
ForEach tasks
([#819](#819)).

### API Changes:

* Added `service_principal_client_id` field for
`databricks.sdk.service.apps.App`.
* Added `azure_service_principal`, `gcp_service_account_key` and
`read_only` fields for
`databricks.sdk.service.catalog.CreateCredentialRequest`.
* Added `azure_service_principal`, `read_only` and
`used_for_managed_storage` fields for
`databricks.sdk.service.catalog.CredentialInfo`.
* Added `omit_username` field for
`databricks.sdk.service.catalog.ListTablesRequest`.
* Added `azure_service_principal` and `read_only` fields for
`databricks.sdk.service.catalog.UpdateCredentialRequest`.
* Added `external_location_name`, `read_only` and `url` fields for
`databricks.sdk.service.catalog.ValidateCredentialRequest`.
* Added `is_dir` field for
`databricks.sdk.service.catalog.ValidateCredentialResponse`.
 * Added `only` field for `databricks.sdk.service.jobs.RunNow`.
* Added `restart_window` field for
`databricks.sdk.service.pipelines.CreatePipeline`.
* Added `restart_window` field for
`databricks.sdk.service.pipelines.EditPipeline`.
* Added `restart_window` field for
`databricks.sdk.service.pipelines.PipelineSpec`.
* Added `private_access_settings_id` field for
`databricks.sdk.service.provisioning.UpdateWorkspaceRequest`.
* Changed `create_credential()` and
`generate_temporary_service_credential()` methods for
[w.credentials](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/credentials.html)
workspace-level service with new required argument order.
* Changed `access_connector_id` field for
`databricks.sdk.service.catalog.AzureManagedIdentity` to be required.
* Changed `access_connector_id` field for
`databricks.sdk.service.catalog.AzureManagedIdentity` to be required.
* Changed `name` field for
`databricks.sdk.service.catalog.CreateCredentialRequest` to be required.
* Changed `credential_name` field for
`databricks.sdk.service.catalog.GenerateTemporaryServiceCredentialRequest`
to be required.

OpenAPI SHA: f2385add116e3716c8a90a0b68e204deb40f996c, Date: 2024-11-15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants