[Fix] Rewind seekable streams before retrying #821

renaudhartert-db · 2024-11-13T21:26:27Z

What changes are proposed in this pull request?

This PR adapts the retry mechanism of BaseClient to only retry if (i) the request is not a stream or (ii) the stream is seekable and can be reset to its initial position. This fixes a bug that led retries to ignore part of the request that were already processed in previous attempts.

How is this tested?

Added unit tests to verify that (i) non-seekable streams are not retried, and (ii) seekable streams are properly reset before retrying.

databricks/sdk/_base_client.py

tests/test_base_client.py

databricks/sdk/_base_client.py

ksafonov-db

I'd personally prefer if we were rewinding stream only when we need to. @pietern WDYT?

databricks/sdk/_base_client.py

tests/test_base_client.py

pietern · 2024-11-14T15:45:52Z

@ksafonov-db I doubt it matters much. Regardless of the outcome of the operation, the caller will have to seek to a known location anyway if they want to keep the handle open in the first place.

Performance impact on failure is negligible because seek only updates the location offset in the file handle and doesn't trigger I/O directly (not counting readahead that the OS might do, but we can ignore that here).

I'll leave it to you to figure out if there is a case for pushing this down into a "pre-run" callback from a structural pov.

github-actions · 2024-11-14T20:13:38Z

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

PR number: 821
Commit SHA: fd893eb99b32560b522d8bd0f8dc411944957a9a

Checks will be approved automatically on success.

eng-dev-ecosystem-bot · 2024-11-14T20:14:04Z

Test Details: go/deco-tests/11844787622

renaudhartert-db · 2024-11-15T13:00:00Z

Confirmed with @ksafonov-db offline that he is good with the current state of the PR

### New Features and Improvements * Read streams by 1MB chunks by default. ([#817](#817)). ### Bug Fixes * Rewind seekable streams before retrying ([#821](#821)). ### Internal Changes * Reformat SDK with YAPF 0.43. ([#822](#822)). * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)). * Update PR template ([#814](#814)). ### API Changes: * Added `databricks.sdk.service.apps`, `databricks.sdk.service.billing`, `databricks.sdk.service.catalog`, `databricks.sdk.service.compute`, `databricks.sdk.service.dashboards`, `databricks.sdk.service.files`, `databricks.sdk.service.iam`, `databricks.sdk.service.jobs`, `databricks.sdk.service.marketplace`, `databricks.sdk.service.ml`, `databricks.sdk.service.oauth2`, `databricks.sdk.service.pipelines`, `databricks.sdk.service.provisioning`, `databricks.sdk.service.serving`, `databricks.sdk.service.settings`, `databricks.sdk.service.sharing`, `databricks.sdk.service.sql`, `databricks.sdk.service.vectorsearch` and `databricks.sdk.service.workspace` packages. OpenAPI SHA: 2035bf5234753adfd080a79bff325dd4a5b90bc2, Date: 2024-11-15

### New Features and Improvements * Read streams by 1MB chunks by default. ([#817](#817)). ### Bug Fixes * Rewind seekable streams before retrying ([#821](#821)). * Properly serialize nested data classes. ### Internal Changes * Reformat SDK with YAPF 0.43. ([#822](#822)). * Update Jobs GetRun API to support paginated responses for jobs and ForEach tasks ([#819](#819)). ### API Changes: * Added `service_principal_client_id` field for `databricks.sdk.service.apps.App`. * Added `azure_service_principal`, `gcp_service_account_key` and `read_only` fields for `databricks.sdk.service.catalog.CreateCredentialRequest`. * Added `azure_service_principal`, `read_only` and `used_for_managed_storage` fields for `databricks.sdk.service.catalog.CredentialInfo`. * Added `omit_username` field for `databricks.sdk.service.catalog.ListTablesRequest`. * Added `azure_service_principal` and `read_only` fields for `databricks.sdk.service.catalog.UpdateCredentialRequest`. * Added `external_location_name`, `read_only` and `url` fields for `databricks.sdk.service.catalog.ValidateCredentialRequest`. * Added `is_dir` field for `databricks.sdk.service.catalog.ValidateCredentialResponse`. * Added `only` field for `databricks.sdk.service.jobs.RunNow`. * Added `restart_window` field for `databricks.sdk.service.pipelines.CreatePipeline`. * Added `restart_window` field for `databricks.sdk.service.pipelines.EditPipeline`. * Added `restart_window` field for `databricks.sdk.service.pipelines.PipelineSpec`. * Added `private_access_settings_id` field for `databricks.sdk.service.provisioning.UpdateWorkspaceRequest`. * Changed `create_credential()` and `generate_temporary_service_credential()` methods for [w.credentials](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/credentials.html) workspace-level service with new required argument order. * Changed `access_connector_id` field for `databricks.sdk.service.catalog.AzureManagedIdentity` to be required. * Changed `access_connector_id` field for `databricks.sdk.service.catalog.AzureManagedIdentity` to be required. * Changed `name` field for `databricks.sdk.service.catalog.CreateCredentialRequest` to be required. * Changed `credential_name` field for `databricks.sdk.service.catalog.GenerateTemporaryServiceCredentialRequest` to be required. OpenAPI SHA: f2385add116e3716c8a90a0b68e204deb40f996c, Date: 2024-11-15

renaudhartert-db added 2 commits November 13, 2024 22:19

Rewind seekable stream before retrying

787241f

Add unit test for non-seekable stream

4a196f7

renaudhartert-db temporarily deployed to test-trigger-is November 13, 2024 21:26 — with GitHub Actions Inactive

renaudhartert-db marked this pull request as ready for review November 13, 2024 21:29

renaudhartert-db requested review from pietern and ksafonov-db November 13, 2024 21:30

Make fmt

5e2202f

renaudhartert-db temporarily deployed to test-trigger-is November 13, 2024 22:57 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is November 13, 2024 22:58 — with GitHub Actions Inactive

pietern approved these changes Nov 14, 2024

View reviewed changes

databricks/sdk/_base_client.py Show resolved Hide resolved

ksafonov-db suggested changes Nov 14, 2024

View reviewed changes

Address review comments

feb9426

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 13:50 — with GitHub Actions Inactive

Address review comments

0ce36e8

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 14:00 — with GitHub Actions Inactive

renaudhartert-db requested a review from ksafonov-db November 14, 2024 14:00

Make fmt

07ebf0b

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 14:01 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 14:05 — with GitHub Actions Inactive

ksafonov-db reviewed Nov 14, 2024

View reviewed changes

databricks/sdk/_base_client.py Show resolved Hide resolved

tests/test_base_client.py Show resolved Hide resolved

tests/test_base_client.py Outdated Show resolved Hide resolved

Address review comments

f05d900

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 16:13 — with GitHub Actions Inactive

Test do instead of perform

527e461

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 16:18 — with GitHub Actions Inactive

Simplify tests

21c797b

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 16:21 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 16:22 — with GitHub Actions Inactive

renaudhartert-db requested a review from ksafonov-db November 14, 2024 16:22

Add TODO

6d2c183

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 16:52 — with GitHub Actions Inactive

Merge branch 'main' into renaud.hartert/stream-reset

7ecc27b

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 20:08 — with GitHub Actions Inactive

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 20:09 — with GitHub Actions Inactive

Make fmt

fd893eb

renaudhartert-db temporarily deployed to test-trigger-is November 14, 2024 20:13 — with GitHub Actions Inactive

renaudhartert-db added this pull request to the merge queue Nov 15, 2024

Merged via the queue into main with commit e8b7916 Nov 15, 2024
19 checks passed

renaudhartert-db deleted the renaud.hartert/stream-reset branch November 15, 2024 13:04

This was referenced Nov 18, 2024

[Release] Release v0.38.0 #827

Closed

[Release] Release v0.38.0 #826

Merged

renaudhartert-db mentioned this pull request Nov 18, 2024

[Internal] Bump release number to 0.38.0 #828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Rewind seekable streams before retrying #821

[Fix] Rewind seekable streams before retrying #821

renaudhartert-db commented Nov 13, 2024 •

edited

Loading

ksafonov-db left a comment

pietern commented Nov 14, 2024

github-actions bot commented Nov 14, 2024

eng-dev-ecosystem-bot commented Nov 14, 2024

renaudhartert-db commented Nov 15, 2024

[Fix] Rewind seekable streams before retrying #821

[Fix] Rewind seekable streams before retrying #821

Conversation

renaudhartert-db commented Nov 13, 2024 • edited Loading

What changes are proposed in this pull request?

How is this tested?

ksafonov-db left a comment

Choose a reason for hiding this comment

pietern commented Nov 14, 2024

github-actions bot commented Nov 14, 2024

eng-dev-ecosystem-bot commented Nov 14, 2024

renaudhartert-db commented Nov 15, 2024

renaudhartert-db commented Nov 13, 2024 •

edited

Loading