[Fix] Handle non-JSON errors gracefully #741

mgyucht · 2024-08-29T15:20:34Z

Changes

Some errors returned by the platform are not serialized using JSON (see databricks/databricks-sdk-go#998 for an example). They are instead serialized in the form "<ERROR_CODE>: ". Today, the SDK cannot parse these error messages well, resulting in a poor user experience.

This PR adds support for parsing these error messages from the platform to the SDK. This should reduce bug reports for the SDK with respect to unexpected response parsing. This PR also refactors the error deserialization logic somewhat to make it more extensible in the future for other potential error formats that are not currently handled.

As a side-effect of this change, I've refactored the structure of the error handling in the Python SDK to more closely reflect how errors are handled in the Go SDK. This should make maintenance more straightforward in the future. It also introduces a new error message to the Python SDK to refer users to our issue tracker when the SDK receives an error response that it cannot parse, like what we do in the Go SDK.

Ports databricks/databricks-sdk-go#1031 to the Python SDK.

Deprecations

This PR deprecates several fields in the constructor for DatabricksError. Going forward, SCIM-specific and API 1.2-specific parameters should not be specified in the constructor; instead, they will be handled in error parsers.

Breaking Changes

The introduction of a different message for non-JSON responses may be a breaking change if users matched on the message structure used before.

Tests

Existing tests still pass, adding tests before merging this.

make test run locally
make fmt applied
relevant integration tests applied

mgyucht · 2024-08-30T09:55:38Z

databricks/sdk/core.py

+            status_code = response.status_code
+            is_http_unauthorized_or_forbidden = status_code in (401, 403)
+            is_too_many_requests_or_unavailable = status_code in (429, 503)
+            if is_http_unauthorized_or_forbidden:
+                error.message = self._cfg.wrap_debug_info(error.message)
+            if is_too_many_requests_or_unavailable:
+                error.retry_after_secs = self._parse_retry_after(response)
+            raise error from None


These used to be added in _make_nicer_error. Now, _get_api_error is basically identical to GetApiError in the Go SDK. This divergent codepath remains thus in the ApiClient to prevent any breaking changes.

mgyucht · 2024-08-30T09:56:05Z

databricks/sdk/core.py

-    @staticmethod
-    def _make_sense_from_html(txt: str) -> str:
-        matchers = [r'<pre>(.*)</pre>', r'<title>(.*)</title>']
-        for attempt in matchers:
-            expr = re.compile(attempt, re.MULTILINE)
-            match = expr.search(txt)
-            if not match:
-                continue
-            return match.group(1).strip()
-        return txt
-
-    def _make_nicer_error(self, *, response: requests.Response, **kwargs) -> DatabricksError:
-        status_code = response.status_code
-        message = kwargs.get('message', 'request failed')
-        is_http_unauthorized_or_forbidden = status_code in (401, 403)
-        is_too_many_requests_or_unavailable = status_code in (429, 503)
-        if is_http_unauthorized_or_forbidden:
-            message = self._cfg.wrap_debug_info(message)
-        if is_too_many_requests_or_unavailable:
-            kwargs['retry_after_secs'] = self._parse_retry_after(response)
-        kwargs['message'] = message
-        return error_mapper(response, kwargs)


These are moved to databricks/sdk/error/parser.py, albeit in different form.

mgyucht · 2024-08-30T09:56:39Z

databricks/sdk/core.py

        if not logger.isEnabledFor(logging.DEBUG):
            return
-        request = response.request


This implementation and remaining functions are moved to databricks/sdk/logger/round_trip_logger.py.

mgyucht · 2024-08-30T09:57:22Z

databricks/sdk/logger/round_trip_logger.py

+            to_log = self._only_n_bytes(body, self._debug_truncate_bytes)
+            log_lines = [prefix + x.strip('\r') for x in to_log.split("\n")]
+            return '\n'.join(log_lines)


This is the only change made in this implementation, so that we can see the actual, truncated response when it is not JSON.

mgyucht · 2024-08-30T09:57:49Z

databricks/sdk/errors/mapper.py

-    if _is_private_link_redirect(response):
-        return _get_private_link_validation_error(response.url)


Moved to _get_api_error.

mgyucht · 2024-08-30T10:04:22Z

databricks/sdk/errors/parser.py

+        # Handle API 1.2-style errors
+        if 'error' in resp:
+            error_args['message'] = resp['error']
+
+        # Handle SCIM Errors
+        detail = resp.get('detail')
+        status = resp.get('status')
+        scim_type = resp.get('scimType')
+        if detail:
+            # Handle SCIM error message details
+            # @see https://tools.ietf.org/html/rfc7644#section-3.7.3
+            error_args[
+                'message'] = f"{scim_type} {error_args.get('message', 'SCIM API Internal Error')}".strip(" ")
+            error_args['error_code'] = f"SCIM_{status}"


This is moved here from the DatabricksError constructor.

mgyucht · 2024-08-30T10:04:42Z

databricks/sdk/errors/parser.py

+            return None
+
+        error_args = {
+            'message': resp.get('message', 'request failed'),


This is taken from _make_nicer_error().

mgyucht · 2024-08-30T10:05:18Z

databricks/sdk/errors/parser.py

+    __HTML_ERROR_REGEXES = [re.compile(r'<pre>(.*)</pre>'), re.compile(r'<title>(.*)</title>'), ]
+
+    def parse_error(self, response: requests.Response, response_body: bytes) -> Optional[dict]:
+        payload_str = response_body.decode('utf-8')
+        for regex in self.__HTML_ERROR_REGEXES:
+            match = regex.search(payload_str)
+            if match:
+                message = match.group(1) if match.group(1) else response.reason
+                return {
+                    'status': response.status_code,
+                    'message': message,
+                    'error_code': response.reason.upper().replace(' ', '_')
+                }
+        logging.debug('_HtmlErrorParser: no <pre> tag found in error response')
+        return None


This logic is meant to replicate _make_sense_from_html.

mgyucht · 2024-08-30T10:06:02Z

tests/test_errors.py

+
+@pytest.mark.parametrize(
+    'response, expected_error, expected_message', subclass_test_cases +
+    [(fake_response('GET', 400, ''), errors.BadRequest, 'Bad Request'),


This is a new case

mgyucht · 2024-08-30T10:08:37Z

tests/test_errors.py

+     (fake_response('GET', 400, 'MALFORMED_REQUEST: vpc_endpoints malformed parameters: VPC Endpoint ... with use_case ... cannot be attached in ... list'), errors.BadRequest, 'vpc_endpoints malformed parameters: VPC Endpoint ... with use_case ... cannot be attached in ... list'),
+     (fake_response('GET', 400, '<pre>Worker environment not ready</pre>'), errors.BadRequest, 'Worker environment not ready'),
+     (fake_response('GET', 400, 'this is not a real response'), errors.BadRequest,
+      ('unable to parse response. This is likely a bug in the Databricks SDK for Python or the underlying API. '
+       'Please report this issue with the following debugging information to the SDK issue tracker at '
+       'https://github.com/databricks/databricks-sdk-go/issues. Request log:```GET /api/2.0/service\n'
+       '< 400 Bad Request\n'
+       '< this is not a real response```')), ])


These are new cases

renaudhartert-db

Looks good overall, left a couple of comments.

renaudhartert-db · 2024-08-30T10:46:06Z

databricks/sdk/core.py

@@ -12,8 +10,8 @@
 from .config import *
 # To preserve backwards compatibility (as these definitions were previously in this module)
 from .credentials_provider import *
-from .errors import DatabricksError, error_mapper
-from .errors.private_link import _is_private_link_redirect
+from .errors import DatabricksError, _get_api_error


Could we remove the _ to indicate that the function is not internal to the errors module?

See Pep 8:

_single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose names start with an underscore.

Will do.

What do you think is a good way to indicate that the function is internal to the SDK but not to the errors module?

To be honest, I don't have a definite answer here. Without changing the code structure, I would simply make that explicit in the function's doc. Otherwise, I would try to organize the core package as a single "umbrella" interface to internal modules that start with a _ (e.g. core/_errors.py). I've also seen people explicitly put private modules in a private directory (e.g. core/private/errors.py). I'll do some research to see if this is covered by our internal style guide.

databricks/sdk/errors/parser.py

### Bug Fixes * Handle non-JSON errors gracefully ([#741](#741)). ### Documentation * Add Data Plane access documentation ([#732](#732)). ### Internal Changes * Fix test_iam::test_scim_error_unmarshall integration test ([#743](#743)). ### API Changes: * Added `regenerate_dashboard()` method for [w.quality_monitors](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/quality_monitors.html) workspace-level service. * Added `databricks.sdk.service.catalog.RegenerateDashboardRequest` and `databricks.sdk.service.catalog.RegenerateDashboardResponse` dataclasses. * Added `databricks.sdk.service.jobs.QueueDetails`, `databricks.sdk.service.jobs.QueueDetailsCodeCode`, `databricks.sdk.service.jobs.RunLifecycleStateV2State`, `databricks.sdk.service.jobs.RunStatus`, `databricks.sdk.service.jobs.TerminationCodeCode`, `databricks.sdk.service.jobs.TerminationDetails` and `databricks.sdk.service.jobs.TerminationTypeType` dataclasses. * Added `status` field for `databricks.sdk.service.jobs.BaseRun`. * Added `status` field for `databricks.sdk.service.jobs.RepairHistoryItem`. * Added `status` field for `databricks.sdk.service.jobs.Run`. * Added `status` field for `databricks.sdk.service.jobs.RunTask`. * Added `max_provisioned_throughput` and `min_provisioned_throughput` fields for `databricks.sdk.service.serving.ServedModelInput`. * Added `columns_to_sync` field for `databricks.sdk.service.vectorsearch.DeltaSyncVectorIndexSpecRequest`. * Changed `workload_size` field for `databricks.sdk.service.serving.ServedModelInput` to no longer be required. OpenAPI SHA: d05898328669a3f8ab0c2ecee37db2673d3ea3f7, Date: 2024-09-04

## Changes #741 introduced a regression in retrieving error details from SCIM APIs. This PR addresses this and adds a regression test for this case. The implementation should now match the Go SDK's here: https://github.com/databricks/databricks-sdk-go/blob/main/apierr/errors.go#L220-L224. Fixes #749. ## Tests Added a unit test based on the supplied response in the ticket. - [ ] `make test` run locally - [ ] `make fmt` applied - [ ] relevant integration tests applied

## Changes #741 introduced a change to how an error message was modified in `ApiClient._perform`. Previously, arguments to the DatabricksError constructor were modified as a dictionary in `_perform`. After that change, `get_api_error` started to return a `DatabricksError` instance whose attributes were modified. The `message` attribute referred to in that change does not exist in the DatabricksError class: there is a `message` constructor parameter, but it is not set as an attribute. This PR refactors the error handling logic slightly to restore the original behavior. In doing this, we decouple all error-parsing and customizing logic out of ApiClient. This also sets us up to allow for further extension of error parsing and customization in the future, a feature that I have seen present in other SDKs. Fixes #755. ## Tests  - [ ] `make test` run locally - [ ] `make fmt` applied - [ ] relevant integration tests applied

## Changes databricks#741 introduced a change to how an error message was modified in `ApiClient._perform`. Previously, arguments to the DatabricksError constructor were modified as a dictionary in `_perform`. After that change, `get_api_error` started to return a `DatabricksError` instance whose attributes were modified. The `message` attribute referred to in that change does not exist in the DatabricksError class: there is a `message` constructor parameter, but it is not set as an attribute. This PR refactors the error handling logic slightly to restore the original behavior. In doing this, we decouple all error-parsing and customizing logic out of ApiClient. This also sets us up to allow for further extension of error parsing and customization in the future, a feature that I have seen present in other SDKs. Fixes databricks#755. ## Tests  - [ ] `make test` run locally - [ ] `make fmt` applied - [ ] relevant integration tests applied

mgyucht added 2 commits August 29, 2024 16:59

more refactoring

ee33355

more refactor

9f778d2

mgyucht marked this pull request as draft August 29, 2024 15:21

fmt

a36a92c

mgyucht marked this pull request as ready for review August 30, 2024 08:49

mgyucht marked this pull request as draft August 30, 2024 08:50

mgyucht added 2 commits August 30, 2024 11:53

better logging, handling of empty responses, unified testing

d8fb98f

fmt

988a384

mgyucht marked this pull request as ready for review August 30, 2024 09:54

mgyucht requested a review from renaudhartert-db August 30, 2024 09:54

mgyucht added 3 commits August 30, 2024 12:03

docs

a3a6f04

another test case

3eed68b

more tests

780a548

mgyucht commented Aug 30, 2024

View reviewed changes

mgyucht added 2 commits August 30, 2024 12:11

formatting

bbcee93

fix

02f2e6d

renaudhartert-db reviewed Aug 30, 2024

View reviewed changes

mgyucht added 4 commits August 30, 2024 13:21

feedback

d317f9e

fix

8ba50d1

fmt

463e3a4

fix for python 3.7

dc4f9e3

mgyucht enabled auto-merge August 30, 2024 12:12

renaudhartert-db approved these changes Aug 30, 2024

View reviewed changes

fmt

cdff06d

mgyucht added this pull request to the merge queue Aug 30, 2024

Merged via the queue into main with commit 3dab457 Aug 30, 2024
14 checks passed

mgyucht deleted the handle-non-json-errors branch August 30, 2024 12:45

mgyucht mentioned this pull request Sep 4, 2024

[Release] Release v0.32.0 #747

Merged

mgyucht mentioned this pull request Sep 10, 2024

[Fix] Properly include message when handing SCIM errors #753

Merged

3 tasks

asnare mentioned this pull request Sep 10, 2024

[ISSUE] Regression in PermissionDefined error handling #755

Closed

mgyucht mentioned this pull request Sep 12, 2024

[Fix] Fix deserialization of 401/403 errors #758

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Handle non-JSON errors gracefully #741

[Fix] Handle non-JSON errors gracefully #741

mgyucht commented Aug 29, 2024 •

edited

Loading

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

mgyucht Aug 30, 2024

renaudhartert-db left a comment

renaudhartert-db Aug 30, 2024

mgyucht Aug 30, 2024

renaudhartert-db Aug 30, 2024

		if _is_private_link_redirect(response):
		return _get_private_link_validation_error(response.url)

[Fix] Handle non-JSON errors gracefully #741

[Fix] Handle non-JSON errors gracefully #741

Conversation

mgyucht commented Aug 29, 2024 • edited Loading

Changes

Deprecations

Breaking Changes

Tests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renaudhartert-db left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgyucht commented Aug 29, 2024 •

edited

Loading