Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Decouple OAuth functionality from Config #784

Merged
merged 19 commits into from
Oct 21, 2024
Merged

[Fix] Decouple OAuth functionality from Config #784

merged 19 commits into from
Oct 21, 2024

Conversation

mgyucht
Copy link
Contributor

@mgyucht mgyucht commented Oct 7, 2024

Changes

OAuth Refactoring

Currently, OAuthClient uses Config internally to resolve the OIDC endpoints by passing the client ID and host to an internal Config instance and calling its oidc_endpoints method. This has a few drawbacks:

  1. There is nearly a cyclical dependency: Config depends on methods in oauth.py, and OAuthClient depends on Config. This currently doesn't break because the Config import is done at runtime in the OAuthClient constructor.
  2. Databricks supports both in-house OAuth and Azure Entra ID OAuth. Currently, the choice between these options depends on whether a user specifies the azure_client_id or client_id parameter in the Config. Because Config is used within OAuthClient, this means that OAuthClient needs to expose a parameter to configure either client_id or azure_client_id.

Rather than having these classes deeply coupled to one another, we can allow users to fetch the OIDC endpoints for a given account/workspace as a top-level functionality and provide this to OAuthClient. This breaks the cyclic dependency and doesn't require OAuthClient to expose any unnecessary parameters.

Further, I've also tried to remove the coupling of the other classes in oauth.py to OAuthClient. Currently, OAuthClient serves both as the mechanism to initialize OAuth and as a kind of configuration object, capturing OAuth endpoint URLs, client ID/secret, redirect URL, and scopes. Now, the parameters for each of these classes are explicit, removing all unnecessarily coupling between them. One nice advantage is that the Consent can be serialized/deserialized without any reference to the OAuthClient anymore.

There is definitely more work to be done to simplify and clean up the OAuth implementation, but this should at least unblock users who need to use Azure Entra ID U2M OAuth in the SDK.

Tests

The new OIDC endpoint methods are tested, and those tests also verify that those endpoints are retried in case of rate limiting.

I ran the flask app example against an AWS workspace, and I ran the external-browser demo example against AWS, Azure and GCP workspaces with the default client ID and with a newly created OAuth app with and without credentials.

  • make test run locally
  • make fmt applied
  • relevant integration tests applied

@mgyucht mgyucht changed the base branch from main to api-client-refactor October 7, 2024 11:47
Copy link

github-actions bot commented Oct 7, 2024

This PR breaks backwards compatibility for databrickslabs/blueprint downstream. See build logs for more details.

Running from downstreams #551

Copy link

github-actions bot commented Oct 7, 2024

This PR breaks backwards compatibility for databrickslabs/ucx downstream. See build logs for more details.

Running from downstreams #551

github-merge-queue bot pushed a commit that referenced this pull request Oct 7, 2024
## Changes
`ApiClient` is also coupled to the `Config` object, which means that it
can't be used in situations where there is no config. For example, when
fetching OIDC endpoints, the user may not have a complete `Config`
instance yet. However, failures when requesting from those endpoints
should still be retried according to the SDK's retry policy.

To address this, I've split the ApiClient into `_BaseClient` and
`ApiClient`. `_BaseClient` is the core implementation of the client
without any dependency on the `Config`. This is similar to what @rauchy
did in the Java SDK to cut the dependency between the `ApiClient` and
`DatabricksConfig`. The `_BaseClient` can then be used when fetching
OIDC endpoint information.

This will be used in
#784 to support
retrying OAuth OIDC endpoint fetches.

## Tests
<!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
Base automatically changed from api-client-refactor to main October 7, 2024 12:40
@mgyucht mgyucht requested a review from hectorcast-db October 10, 2024 14:25
@hectorcast-db
Copy link
Contributor

This changes public classes/methods and introduces backward incompatible changes.
Any way we can minimize the impact for the customers?

@mgyucht
Copy link
Contributor Author

mgyucht commented Oct 18, 2024

I'll list the breaking changes here and we can discuss whether the mitigation is sufficient:

  • OAuthClient constructor: I have to break the constructor definition because the OAuthClient doesn't need to know the workspace host but needs to know the OIDC endpoints. To mitigate this, I've added from_host() as a static method that preserves the original behavior.
  • TokenCache constructor: Previously, the OAuthClient was used as a configuration class, but now users will have to pass in the parameters one at a time. Importantly, because OAuthClient doesn't depend on host anymore, the TokenCache needs a host parameter. In all honesty, this should really be a parameter to load.
  • external_browser behavior: previously, it would use a multi-tenant app that we published in our account: 6128a518-99a9-425b-8333-4cc94f04cacd. Now that all customers have access to the databricks-cli app in their account by default, we can use that globally as a default, even in GCP.

Otherwise, it's a confusing pattern for the token to contain a reference to the client which refreshes it. That's part of the reason for the spaghetti architecture where parameters like client ID and token endpoint need to be passed through many layers of classes to be used. I would rather not try to fix these things in this PR (and probably should save them for the next SDK version).

We can include the following in the changelog for this PR in the release:

Breaking Changes

  • external_browser now uses the databricks-cli app instead of the third-party "6128a518-99a9-425b-8333-4cc94f04cacd" application when performing the U2M login flow for Azure workspaces when a client ID is not otherwise specified. This matches the AWS behavior.
  • The signatures of several OAuth-related constructors have changed to support U2M OAuth with Azure Entra ID application registrations. See https://github.com/databricks/databricks-sdk-py/blob/main/examples/flask_app_with_oauth.py for examples of how to use these classes.
    • OAuthClient(): renamed to OAuthClient.from_host()
    • SessionCredentials() and SessionCredentials.from_dict(): now accepts token_endpoint, client_id, client_secret, and refresh_url as parameters, rather than accepting the OAuthClient.
    • TokenCache(): now accepts host, token_endpoint, client_id, client_secret, and refresh_url as parameters, rather than accepting the OAuthClient.

@mgyucht mgyucht added this pull request to the merge queue Oct 21, 2024
Merged via the queue into main with commit 32ba221 Oct 21, 2024
14 checks passed
@mgyucht mgyucht deleted the deco-23621 branch October 21, 2024 07:52
rauchy added a commit that referenced this pull request Oct 22, 2024
### Breaking Changes
* `external_browser` now uses the `databricks-cli` app instead of the third-party "6128a518-99a9-425b-8333-4cc94f04cacd" application when performing the U2M login flow for Azure workspaces when a client ID is not otherwise specified. This matches the AWS behavior.
* The signatures of several OAuth-related constructors have changed to support U2M OAuth with Azure Entra ID application registrations. See https://github.com/databricks/databricks-sdk-py/blob/main/examples/flask_app_with_oauth.py for examples of how to use these classes.
  * `OAuthClient()`: renamed to `OAuthClient.from_host()`
  * `SessionCredentials()` and `SessionCredentials.from_dict()`: now accepts `token_endpoint`, `client_id`, `client_secret`, and `refresh_url` as parameters, rather than accepting the `OAuthClient`.
  * `TokenCache()`: now accepts `host`, `token_endpoint`, `client_id`, `client_secret`, and `refresh_url` as parameters, rather than accepting the `OAuthClient`.

### Bug Fixes

 * Decouple OAuth functionality from `Config` ([#784](#784)).

### Release

 * Release v0.35.0 ([#793](#793)).
@rauchy rauchy mentioned this pull request Oct 22, 2024
github-merge-queue bot pushed a commit that referenced this pull request Oct 22, 2024
### Breaking Changes
* `external_browser` now uses the `databricks-cli` app instead of the
third-party "6128a518-99a9-425b-8333-4cc94f04cacd" application when
performing the U2M login flow for Azure workspaces when a client ID is
not otherwise specified. This matches the AWS behavior.
* The signatures of several OAuth-related constructors have changed to
support U2M OAuth with Azure Entra ID application registrations. See
https://github.com/databricks/databricks-sdk-py/blob/main/examples/flask_app_with_oauth.py
for examples of how to use these classes.
  * `OAuthClient()`: renamed to `OAuthClient.from_host()`
* `SessionCredentials()` and `SessionCredentials.from_dict()`: now
accepts `token_endpoint`, `client_id`, `client_secret`, and
`refresh_url` as parameters, rather than accepting the `OAuthClient`.
* `TokenCache()`: now accepts `host`, `token_endpoint`, `client_id`,
`client_secret`, and `refresh_url` as parameters, rather than accepting
the `OAuthClient`.

### Bug Fixes

* Decouple OAuth functionality from `Config`
([#784](#784)).


### Release

* Release v0.35.0
([#793](#793)).

Co-authored-by: Omer Lachish <rauchy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants