Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added w.config.account_host to get the relevant account host from a workspace client #390

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Oct 6, 2023

Changes

This simplifies downstream integration testing as well as configuration

Tests

  • make test run locally
  • make fmt applied
  • relevant integration tests applied

@nfx nfx added the chore anything that makes the code better label Oct 6, 2023
@nfx nfx requested a review from a team October 6, 2023 18:11
@codecov-commenter
Copy link

codecov-commenter commented Oct 6, 2023

Codecov Report

All modified lines are covered by tests ✅

Files Coverage Δ
databricks/sdk/azure.py 100.00% <100.00%> (ø)
databricks/sdk/core.py 80.69% <100.00%> (+0.32%) ⬆️

📢 Thoughts on this report? Let us know!.

active_directory_endpoint="https://login.microsoftonline.de/"),
GERMAN=AzureEnvironment(
name="AzureGermanCloud",
databricks_hostname_suffix='.databricks.azure.de', # TODO: i'm not sure about this one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK because we don't run Databricks in this cloud (and it is being deprecated: https://learn.microsoft.com/en-us/previous-versions/azure/germany/germany-welcome)

@@ -617,8 +613,31 @@ def is_account_client(self) -> bool:
return False
return self.host.startswith("https://accounts.") or self.host.startswith("https://accounts-dod.")

@property
def account_host(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for dev/staging, right? We occasionally get internal issues where the SDK doesn't work well with dev/staging.

Also, can we label this somehow as experimental? In the future, if we introduce a workspace-level metadata service, the accounts endpoint can be exposed there, and we can remove this property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you suggest on the pattern here to make it work for dev/staging?

btw, once we introduce workspace-level metadata service, we should just print warnings from within this call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea: have a list to lookup cloud/env/account host name based on domain suffix, from most-specific to least-specific:

    # Most specific to least specific
    [
        ('.dev.databricks.com', 'AWS', 'DEV', 'PUBLIC', 'accounts.dev.databricks.com'),
        ('.staging.cloud.databricks.com', 'AWS', 'STAGING', 'PUBLIC', 'accounts.staging.cloud.databricks.com'),
        ('.cloud.databricks.com', 'AWS', 'PROD', 'PUBLIC', 'accounts.cloud.databricks.com'),
        ('.cloud.databricks.us', 'AWS', 'PROD', 'GOV', 'accounts.cloud.databricks.us'),
        ('.dev.azuredatabricks.net', 'AZURE', 'DEV', 'PUBLIC', 'accounts.dev.azuredatabricks.net')
        ('.staging.azuredatabricks.net', 'AZURE', 'STAGING', 'PUBLIC', 'accounts.staging.azuredatabricks.net'),
        ('.azuredatabricks.net', 'AZURE', 'PROD', 'PUBLIC', 'accounts.azuredatabricks.net'),
        ('.databricks.azure.us', 'AZURE', 'PROD', 'GOV', 'accounts.databricks.azure.us'),
        ('.dev.gcp.databricks.com', 'GCP', 'DEV', 'PUBLIC', 'accounts.dev.gcp.databricks.com'),
        ('.staging.gcp.databricks.com', 'GCP', 'STAGING', 'PUBLIC', 'accounts.staging.gcp.databricks.com'),
        ('.gcp.databricks.com', 'GCP', 'PROD', 'PUBLIC', 'accounts.gcp.databricks.com'),
    ]

The simpler version is to just replace the most specific zone with accounts, which works everywhere today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgyucht do we want to expose dev/staging?... if we do - then why don't we expose the azure_login_app_id along with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely fine by me!

@mgyucht
Copy link
Contributor

mgyucht commented Oct 12, 2023

One nit to make this work for dev/staging & label as experimental, but otherwise this seems reasonable to me if it unblocks you/customers.

@hamza-db
Copy link
Contributor

hamza-db commented Nov 8, 2023

all the info here is fine to be public ( and already public)

Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you for contributing! Couple nits/suggestions, otherwise this looks good to me.

@@ -617,8 +613,31 @@ def is_account_client(self) -> bool:
return False
return self.host.startswith("https://accounts.") or self.host.startswith("https://accounts-dod.")

@property
def account_host(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely fine by me!

databricks/sdk/core.py Outdated Show resolved Hide resolved
cloud: Cloud

# domain suffixes are not very secret: https://crt.sh/?q=databricks
tld: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's call this dns_zone (I think TLD would traditionally refer to com. or net.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dns_zone it is then.

Comment on lines 530 to 602
def deployment(self, deployment_name: str) -> str:
return f'https://{deployment_name}{self.tld}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe just a bit abstract. It won't work in Azure the way a user might expect, where the DNS name can't be determined from the workspace ID alone. Given we're only using this for account host, I suggest we just inline it there.

Copy link
Contributor Author

@nfx nfx Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already part of our response:

Notification_Center

for now, we work it around in https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/account/workspaces.py#L101-L126, but @iaroslav-db fixed the mapping so that we don't have to work it around soon ;)

I need this function to create a workspace client from an account client. See:

  1. https://github.com/databrickslabs/watchdog/blob/main/scan/scan.go#L144-L154
  2. https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/account/workspaces.py#L122-L126
  3. https://github.com/databricks/terraform-provider-databricks/blob/baa61c451055271e14e7a9bc3db43537dcfda8ea/mws/resource_mws_workspaces.go#L372-L377

I'll do a follow-up PR to account client once azure changes are rolled out to prod

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh perfect, I actually didn't realize those workspace APIs worked in azure yet.

… workspace client

This simplifies downstream integration testing as well as configuration

introduce `DatabricksEnvironment`

..
@nfx nfx force-pushed the feat/account-host branch from bea774f to cb830fc Compare November 9, 2023 09:21
@nfx nfx added the enhancement New feature or request label Nov 9, 2023
@nfx
Copy link
Contributor Author

nfx commented Nov 9, 2023

this needs a bit more adjustments to work

@pietern
Copy link
Contributor

pietern commented Nov 9, 2023

Are similar PRs for the other SDKs tracked some place?

@nfx
Copy link
Contributor Author

nfx commented Nov 10, 2023

@pietern i'll have one for GoLang the following week - once i figure out the way to have it friendly for unit tests, need the same thing for two projects

github-merge-queue bot pushed a commit to databricks/databricks-sdk-go that referenced this pull request Nov 28, 2023
…ere IMDS doesn't give host environment information (#700)

## Changes
This PR allows determining Azure Environment from a Databricks account
or workspace hostname, removing the need for a separate
configuration/environment variable and complexities related to Azure MSI
from within ACR.

Similar functionality in Python SDK: 
- databricks/databricks-sdk-py#390

Stacked on top of:
- #699

## Tests
<!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->

- [x] `make test` passing
- [x] `make fmt` applied
- [ ] relevant integration tests applied
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore anything that makes the code better do-not-merge enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants