Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Handle non-JSON errors gracefully #741

Merged
merged 15 commits into from
Aug 30, 2024
Merged

Conversation

mgyucht
Copy link
Contributor

@mgyucht mgyucht commented Aug 29, 2024

Changes

Some errors returned by the platform are not serialized using JSON (see databricks/databricks-sdk-go#998 for an example). They are instead serialized in the form "<ERROR_CODE>: ". Today, the SDK cannot parse these error messages well, resulting in a poor user experience.

This PR adds support for parsing these error messages from the platform to the SDK. This should reduce bug reports for the SDK with respect to unexpected response parsing. This PR also refactors the error deserialization logic somewhat to make it more extensible in the future for other potential error formats that are not currently handled.

As a side-effect of this change, I've refactored the structure of the error handling in the Python SDK to more closely reflect how errors are handled in the Go SDK. This should make maintenance more straightforward in the future. It also introduces a new error message to the Python SDK to refer users to our issue tracker when the SDK receives an error response that it cannot parse, like what we do in the Go SDK.

Ports databricks/databricks-sdk-go#1031 to the Python SDK.

Deprecations

This PR deprecates several fields in the constructor for DatabricksError. Going forward, SCIM-specific and API 1.2-specific parameters should not be specified in the constructor; instead, they will be handled in error parsers.

Breaking Changes

The introduction of a different message for non-JSON responses may be a breaking change if users matched on the message structure used before.

Tests

Existing tests still pass, adding tests before merging this.

  • make test run locally
  • make fmt applied
  • relevant integration tests applied

@mgyucht mgyucht marked this pull request as draft August 29, 2024 15:21
@mgyucht mgyucht marked this pull request as ready for review August 30, 2024 08:49
@mgyucht mgyucht marked this pull request as draft August 30, 2024 08:50
@mgyucht mgyucht marked this pull request as ready for review August 30, 2024 09:54
Comment on lines +266 to +273
status_code = response.status_code
is_http_unauthorized_or_forbidden = status_code in (401, 403)
is_too_many_requests_or_unavailable = status_code in (429, 503)
if is_http_unauthorized_or_forbidden:
error.message = self._cfg.wrap_debug_info(error.message)
if is_too_many_requests_or_unavailable:
error.retry_after_secs = self._parse_retry_after(response)
raise error from None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These used to be added in _make_nicer_error. Now, _get_api_error is basically identical to GetApiError in the Go SDK. This divergent codepath remains thus in the ApiClient to prevent any breaking changes.

Comment on lines -283 to -304
@staticmethod
def _make_sense_from_html(txt: str) -> str:
matchers = [r'<pre>(.*)</pre>', r'<title>(.*)</title>']
for attempt in matchers:
expr = re.compile(attempt, re.MULTILINE)
match = expr.search(txt)
if not match:
continue
return match.group(1).strip()
return txt

def _make_nicer_error(self, *, response: requests.Response, **kwargs) -> DatabricksError:
status_code = response.status_code
message = kwargs.get('message', 'request failed')
is_http_unauthorized_or_forbidden = status_code in (401, 403)
is_too_many_requests_or_unavailable = status_code in (429, 503)
if is_http_unauthorized_or_forbidden:
message = self._cfg.wrap_debug_info(message)
if is_too_many_requests_or_unavailable:
kwargs['retry_after_secs'] = self._parse_retry_after(response)
kwargs['message'] = message
return error_mapper(response, kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are moved to databricks/sdk/error/parser.py, albeit in different form.

if not logger.isEnabledFor(logging.DEBUG):
return
request = response.request
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation and remaining functions are moved to databricks/sdk/logger/round_trip_logger.py.

Comment on lines +102 to +104
to_log = self._only_n_bytes(body, self._debug_truncate_bytes)
log_lines = [prefix + x.strip('\r') for x in to_log.split("\n")]
return '\n'.join(log_lines)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change made in this implementation, so that we can see the actual, truncated response when it is not JSON.

Comment on lines -26 to -27
if _is_private_link_redirect(response):
return _get_private_link_validation_error(response.url)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to _get_api_error.

Comment on lines +52 to +65
# Handle API 1.2-style errors
if 'error' in resp:
error_args['message'] = resp['error']

# Handle SCIM Errors
detail = resp.get('detail')
status = resp.get('status')
scim_type = resp.get('scimType')
if detail:
# Handle SCIM error message details
# @see https://tools.ietf.org/html/rfc7644#section-3.7.3
error_args[
'message'] = f"{scim_type} {error_args.get('message', 'SCIM API Internal Error')}".strip(" ")
error_args['error_code'] = f"SCIM_{status}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved here from the DatabricksError constructor.

return None

error_args = {
'message': resp.get('message', 'request failed'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taken from _make_nicer_error().

Comment on lines +91 to +105
__HTML_ERROR_REGEXES = [re.compile(r'<pre>(.*)</pre>'), re.compile(r'<title>(.*)</title>'), ]

def parse_error(self, response: requests.Response, response_body: bytes) -> Optional[dict]:
payload_str = response_body.decode('utf-8')
for regex in self.__HTML_ERROR_REGEXES:
match = regex.search(payload_str)
if match:
message = match.group(1) if match.group(1) else response.reason
return {
'status': response.status_code,
'message': message,
'error_code': response.reason.upper().replace(' ', '_')
}
logging.debug('_HtmlErrorParser: no <pre> tag found in error response')
return None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is meant to replicate _make_sense_from_html.


@pytest.mark.parametrize(
'response, expected_error, expected_message', subclass_test_cases +
[(fake_response('GET', 400, ''), errors.BadRequest, 'Bad Request'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new case

Comment on lines 90 to 97
(fake_response('GET', 400, 'MALFORMED_REQUEST: vpc_endpoints malformed parameters: VPC Endpoint ... with use_case ... cannot be attached in ... list'), errors.BadRequest, 'vpc_endpoints malformed parameters: VPC Endpoint ... with use_case ... cannot be attached in ... list'),
(fake_response('GET', 400, '<pre>Worker environment not ready</pre>'), errors.BadRequest, 'Worker environment not ready'),
(fake_response('GET', 400, 'this is not a real response'), errors.BadRequest,
('unable to parse response. This is likely a bug in the Databricks SDK for Python or the underlying API. '
'Please report this issue with the following debugging information to the SDK issue tracker at '
'https://github.com/databricks/databricks-sdk-go/issues. Request log:```GET /api/2.0/service\n'
'< 400 Bad Request\n'
'< this is not a real response```')), ])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are new cases

Copy link
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left a couple of comments.

@@ -12,8 +10,8 @@
from .config import *
# To preserve backwards compatibility (as these definitions were previously in this module)
from .credentials_provider import *
from .errors import DatabricksError, error_mapper
from .errors.private_link import _is_private_link_redirect
from .errors import DatabricksError, _get_api_error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the _ to indicate that the function is not internal to the errors module?

See Pep 8:

_single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose names start with an underscore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

What do you think is a good way to indicate that the function is internal to the SDK but not to the errors module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't have a definite answer here. Without changing the code structure, I would simply make that explicit in the function's doc. Otherwise, I would try to organize the core package as a single "umbrella" interface to internal modules that start with a _ (e.g. core/_errors.py). I've also seen people explicitly put private modules in a private directory (e.g. core/private/errors.py). I'll do some research to see if this is covered by our internal style guide.

databricks/sdk/errors/parser.py Outdated Show resolved Hide resolved
databricks/sdk/errors/parser.py Show resolved Hide resolved
@mgyucht mgyucht enabled auto-merge August 30, 2024 12:12
@mgyucht mgyucht added this pull request to the merge queue Aug 30, 2024
Merged via the queue into main with commit 3dab457 Aug 30, 2024
14 checks passed
@mgyucht mgyucht deleted the handle-non-json-errors branch August 30, 2024 12:45
mgyucht added a commit that referenced this pull request Sep 4, 2024
### Bug Fixes

 * Handle non-JSON errors gracefully ([#741](#741)).

### Documentation

 * Add Data Plane access documentation ([#732](#732)).

### Internal Changes

 * Fix test_iam::test_scim_error_unmarshall integration test ([#743](#743)).

### API Changes:

 * Added `regenerate_dashboard()` method for [w.quality_monitors](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/quality_monitors.html) workspace-level service.
 * Added `databricks.sdk.service.catalog.RegenerateDashboardRequest` and `databricks.sdk.service.catalog.RegenerateDashboardResponse` dataclasses.
 * Added `databricks.sdk.service.jobs.QueueDetails`, `databricks.sdk.service.jobs.QueueDetailsCodeCode`, `databricks.sdk.service.jobs.RunLifecycleStateV2State`, `databricks.sdk.service.jobs.RunStatus`, `databricks.sdk.service.jobs.TerminationCodeCode`, `databricks.sdk.service.jobs.TerminationDetails` and `databricks.sdk.service.jobs.TerminationTypeType` dataclasses.
 * Added `status` field for `databricks.sdk.service.jobs.BaseRun`.
 * Added `status` field for `databricks.sdk.service.jobs.RepairHistoryItem`.
 * Added `status` field for `databricks.sdk.service.jobs.Run`.
 * Added `status` field for `databricks.sdk.service.jobs.RunTask`.
 * Added `max_provisioned_throughput` and `min_provisioned_throughput` fields for `databricks.sdk.service.serving.ServedModelInput`.
 * Added `columns_to_sync` field for `databricks.sdk.service.vectorsearch.DeltaSyncVectorIndexSpecRequest`.
 * Changed `workload_size` field for `databricks.sdk.service.serving.ServedModelInput` to no longer be required.

OpenAPI SHA: d05898328669a3f8ab0c2ecee37db2673d3ea3f7, Date: 2024-09-04
@mgyucht mgyucht mentioned this pull request Sep 4, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 4, 2024
### Bug Fixes

* Handle non-JSON errors gracefully
([#741](#741)).


### Documentation

* Add Data Plane access documentation
([#732](#732)).


### Internal Changes

* Fix test_iam::test_scim_error_unmarshall integration test
([#743](#743)).


### API Changes:

* Added `regenerate_dashboard()` method for
[w.quality_monitors](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/quality_monitors.html)
workspace-level service.
* Added `databricks.sdk.service.catalog.RegenerateDashboardRequest` and
`databricks.sdk.service.catalog.RegenerateDashboardResponse`
dataclasses.
* Added `databricks.sdk.service.jobs.QueueDetails`,
`databricks.sdk.service.jobs.QueueDetailsCodeCode`,
`databricks.sdk.service.jobs.RunLifecycleStateV2State`,
`databricks.sdk.service.jobs.RunStatus`,
`databricks.sdk.service.jobs.TerminationCodeCode`,
`databricks.sdk.service.jobs.TerminationDetails` and
`databricks.sdk.service.jobs.TerminationTypeType` dataclasses.
 * Added `status` field for `databricks.sdk.service.jobs.BaseRun`.
* Added `status` field for
`databricks.sdk.service.jobs.RepairHistoryItem`.
 * Added `status` field for `databricks.sdk.service.jobs.Run`.
 * Added `status` field for `databricks.sdk.service.jobs.RunTask`.
* Added `max_provisioned_throughput` and `min_provisioned_throughput`
fields for `databricks.sdk.service.serving.ServedModelInput`.
* Added `columns_to_sync` field for
`databricks.sdk.service.vectorsearch.DeltaSyncVectorIndexSpecRequest`.
* Changed `workload_size` field for
`databricks.sdk.service.serving.ServedModelInput` to no longer be
required.

OpenAPI SHA: d05898328669a3f8ab0c2ecee37db2673d3ea3f7, Date: 2024-09-04
github-merge-queue bot pushed a commit that referenced this pull request Sep 10, 2024
## Changes
#741 introduced a regression in retrieving error details from SCIM APIs.
This PR addresses this and adds a regression test for this case. The
implementation should now match the Go SDK's here:
https://github.com/databricks/databricks-sdk-go/blob/main/apierr/errors.go#L220-L224.

Fixes #749.

## Tests
Added a unit test based on the supplied response in the ticket.

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
github-merge-queue bot pushed a commit that referenced this pull request Sep 13, 2024
## Changes
#741 introduced a change to how an error message was modified in
`ApiClient._perform`. Previously, arguments to the DatabricksError
constructor were modified as a dictionary in `_perform`. After that
change, `get_api_error` started to return a `DatabricksError` instance
whose attributes were modified. The `message` attribute referred to in
that change does not exist in the DatabricksError class: there is a
`message` constructor parameter, but it is not set as an attribute.

This PR refactors the error handling logic slightly to restore the
original behavior. In doing this, we decouple all error-parsing and
customizing logic out of ApiClient. This also sets us up to allow for
further extension of error parsing and customization in the future, a
feature that I have seen present in other SDKs.

Fixes #755.

## Tests
<!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
aravind-segu pushed a commit to aravind-segu/databricks-sdk-py that referenced this pull request Sep 18, 2024
## Changes
databricks#741 introduced a change to how an error message was modified in
`ApiClient._perform`. Previously, arguments to the DatabricksError
constructor were modified as a dictionary in `_perform`. After that
change, `get_api_error` started to return a `DatabricksError` instance
whose attributes were modified. The `message` attribute referred to in
that change does not exist in the DatabricksError class: there is a
`message` constructor parameter, but it is not set as an attribute.

This PR refactors the error handling logic slightly to restore the
original behavior. In doing this, we decouple all error-parsing and
customizing logic out of ApiClient. This also sets us up to allow for
further extension of error parsing and customization in the future, a
feature that I have seen present in other SDKs.

Fixes databricks#755.

## Tests
<!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants