Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always retry on HTTP 429 and 503, even if Retry-After header is not available #396

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Oct 11, 2023

Changes

This PR improves retry logic using more type-safe code to determine sleep periods between retries or falling back to the default sleep interval. Relies on concrete exception types introduced by #376

Tests

  • make test run locally
  • make fmt applied
  • relevant integration tests applied

nfx added 6 commits October 3, 2023 14:03
…ists`, `BadRequest`, `PermissionDenied`, `InternalError`, and others

Improve the ergonomics of SDK, where instead of `except DatabricksError as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff()` we could do `except NotFound: do_stuff()`. Additionally, it'll make it easier to read stack traces, as they will contain specific exception class name.

# First principles
- do not override `builtins.NotImplemented` for `NOT_IMPLEMENTED` error code
- assume that platform error_code/HTTP status code mapping is not perfect and in the state of transition
- we do represent reasonable subset of error codes as specific exceptions

## Open questions

### HTTP Status Codes vs Error Codes

1. Mix between status codes and error codes (preferred)
2. Rely only on HTTP status codes
3. Rely only on `error_code`'s

One example is `BAD_REQUEST` error code that maps onto HTTP 400 and to `except BadRequest as err` catch clause. But `MALFORMED_REQUEST`, `INVALID_STATE`, and `UNPARSEABLE_HTTP_ERROR` do also map to HTTP 400. So the proposal is to remap the MALFORMED_REQUEST to `BadRequest` exception.

Another corner-case is UC:
- 'METASTORE_DOES_NOT_EXIST': NotFound,
- 'DAC_DOES_NOT_EXIST': NotFound,
- 'CATALOG_DOES_NOT_EXIST': NotFound,
- 'SCHEMA_DOES_NOT_EXIST': NotFound,
- 'TABLE_DOES_NOT_EXIST': NotFound,
- 'SHARE_DOES_NOT_EXIST': NotFound,
- 'RECIPIENT_DOES_NOT_EXIST': NotFound,
- 'STORAGE_CREDENTIAL_DOES_NOT_EXIST': NotFound,
- 'EXTERNAL_LOCATION_DOES_NOT_EXIST': NotFound,
- 'PRINCIPAL_DOES_NOT_EXIST': NotFound,
- 'PROVIDER_DOES_NOT_EXIST': NotFound,

### Naming conflict resolution

We have three sources of naming:
- `error_code`
- HTTP Status
- Python builtin exceptions

We still have to define which name takes the precedence.
… available

This PR improves retry logic by using more type-safe code to determine sleep time periods between retries or falling back to the default sleep interval.
@nfx nfx added the chore anything that makes the code better label Oct 11, 2023
@nfx nfx requested a review from pietern October 11, 2023 10:16
github-merge-queue bot pushed a commit that referenced this pull request Oct 13, 2023
## Changes
Change HTTP client to always retry on 429 and 503, regardless of whether
Retry-After header is present. If absent, default to 1 second, as we do
today.

Subsumes #391 and #396 while we're still discussing error architecture
in the SDK.

## Tests
Unit test to ensure that 429 and 503 work with and without Retry-After
header.
Base automatically changed from ergonomics/errors to main November 13, 2023 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore anything that makes the code better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant