Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added databricks labs ucx validate-groups-membership command to validate groups to see if they have same membership across acount and workspace level #772

Merged
merged 12 commits into from
Jan 19, 2024

Conversation

prajin-29
Copy link
Contributor

@prajin-29 prajin-29 commented Jan 11, 2024

This pull request introduces several changes related to managing Databricks workspace groups. Here's a summary of the changes:

  1. A new command validate-groups-membership is added to the CLI tool in the file cli.py. This command validates the groups in the workspace and account levels to see if they have different memberships. It returns a table showing the differences between the groups.
  2. A new class GroupManager is added to the file groups.py to manage the groups in the workspace and account levels. This class includes a new method validate_group_membership that compares the members of each group in the workspace and account levels and returns a list of groups that have different memberships.
  3. A new fixture make_ucx_group_with_diff_members is added to the file conftest.py to create a pair of groups with different members.
  4. A new test case test_group_matching_names_with_diff_users is added to the file test_groups.py to test the validate_group_membership method with a pair of groups that have different members.
  5. A retries decorator is added to the test case test_replace_workspace_groups_with_account_groups to handle potential intermittent failures.

These changes improve the group management functionality of the tool and make it more robust and reliable. The new validate-groups-membership command provides valuable information about the groups in the workspace and account levels, and the new GroupManager class and test cases ensure that the group management functionality works as expected.

Closes #649

@prajin-29 prajin-29 requested review from a team and pritishpai January 11, 2024 10:33
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this functionality as a CLI command. You can rely on GroupManager, but don't rely on ucx.groups table - e.g. it should be possible to run databricks labs ucx validate-groups before running the assessment workflow

@@ -38,6 +38,7 @@ class MigratedGroup:
entitlements: str | None = None
external_id: str | None = None
roles: str | None = None
is_same_membership: bool | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, you're not allowed to add this column, as it'll be a breaking change to database, which this small feature is not justifying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the logic to incorporate it with CLI.

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (af80620) 84.07% compared to head (89f5e01) 84.19%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #772      +/-   ##
==========================================
+ Coverage   84.07%   84.19%   +0.12%     
==========================================
  Files          39       39              
  Lines        4872     4910      +38     
  Branches      913      919       +6     
==========================================
+ Hits         4096     4134      +38     
  Misses        564      564              
  Partials      212      212              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@prajin-29 prajin-29 changed the title Logging Groups that does not have same membrship at account and workspace level Added databricks labs ucx validate-groups-membership command to validate groups to see if they have same membership across acount and workspace level Jan 12, 2024
@prajin-29
Copy link
Contributor Author

do this functionality as a CLI command. You can rely on GroupManager, but don't rely on ucx.groups table - e.g. it should be possible to run databricks labs ucx validate-groups before running the assessment workflow

@nfx .Did this in CLI Command. Can you please validate the same.

@@ -105,6 +105,26 @@ def inner(workspace_group_name=None, account_group_name=None):
return inner


@pytest.fixture
def make_ucx_group_with_diff_members(make_random, make_group, make_acc_group, make_user):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add a fixture if it's used only once. It makes the code convoluted and harder to maintain. Rewrite by inlining the fixture code into that single test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this fixture and rewrote it inline with test.


def test_validate_group_diff_membership():
backend = MockBackend()
wsclient = MagicMock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "create_autospec(WorkspaceClient)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used create_autospec


def test_validate_group_same_membership():
backend = MockBackend()
wsclient = MagicMock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create autospec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used create_autospec

Copy link
Contributor

@william-conti william-conti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should plan on enlarging this feature for all the workspaces in the account

account_group_regex=account_group_regex,
)
mismatch_groups = group_manager.validate_group_membership()
print(json.dumps(mismatch_groups))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be better to return an error code when there's at least one entry in the mismatch_groups

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if CLI wrapper supports that just yet

@@ -138,6 +139,37 @@ def repair_run(w: WorkspaceClient, step):
installer.repair_run(step)


@ucx.command
def validate_groups_membership(w: WorkspaceClient):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I wan't to scope it for all workspaces in my account ?

Copy link
Collaborator

@nfx nfx Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be an additional command with is_account_level=True, it should call this function and pass down a workspace client for each workspace

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Inline fixture that is not reused
  2. Make variable names consistent
  3. Add additional account level command

labs.yml Outdated
- name: validate-groups-membership
description: Validate the groups to see if the groups at account level and workspace level have different membership
table_template: |-
Workflow_Group_Name\tAccount_Group_Name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use spaces for separating words, not underscores

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced _ with space

labs.yml Outdated
description: Validate the groups to see if the groups at account level and workspace level have different membership
table_template: |-
Workflow_Group_Name\tAccount_Group_Name
{{range .}}{{.wf_group_name}}\t{{.ac_group_name}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{range .}}{{.wf_group_name}}\t{{.ac_group_name}}
{{range .}}{{.name_in_workspace}}\t{{.name_in_account}}

Use conventional variable names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed with the conventional names

account_group_regex=account_group_regex,
)
mismatch_groups = group_manager.validate_group_membership()
print(json.dumps(mismatch_groups))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if CLI wrapper supports that just yet

@nfx nfx merged commit 0948141 into main Jan 19, 2024
7 checks passed
@nfx nfx deleted the feature/log_group_membership branch January 19, 2024 09:54
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments ([#764](#764)).
* Added issue and pull request templates ([#791](#791)).
* Added linked issues to PR template ([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)).
* Fixed handling of `DELTASHARING` table format ([#802](#802)).
* Fixed listing of workflows via CLI ([#811](#811)).
* Fixed logger import path for DEBUG notebook ([#792](#792)).
* Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)).
* Increase the unit test coverage for cli.py ([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh ([#781](#781)).
* Updated `bug` issue template ([#797](#797)).
* Fixed writing log readme in multiprocess safe way ([#794](#794)).
@nfx nfx mentioned this pull request Jan 19, 2024
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Catch groups that doesn't have the same membership on Workspace and Account level
3 participants