Added more views to assessment dashboard #474

nfx · 2023-10-18T16:27:53Z

This PR has the following improvements for views and tables:

assessment/queries are renamed and refactored to more efficiently fit the grid
introduced assessment/views to hold common views that could be reused by multiple commands and widgets.
introduced crawlers.SchemaDeployer, that is (re-)creating views and tables upon installation (and not on the assessment step).
added search_by field for table widgets
improved parallelism for fetching grants
removed (now redundant) setup_schema and setup_view tasks from assessment workflow

william-conti · 2023-10-18T17:33:47Z

src/databricks/labs/ucx/assessment/queries/00_3_count_external_locations.sql

@@ -0,0 +1,4 @@
+-- viz type=counter, name=Storage Locations, counter_label=Storage Locations, value_column=count_total


Name should be number of external locations and not Storage Locations

the idea there that it's not a real external location, but can change that, of course

This widget should have some sort of description then, or we should include the dashboard sketch in the README to avoid confusion with External Location and Storage Credentials

Strong +1 on explaining the dashboards somewhere! Could be in a separate PR though (and I guess once we stabilize the dashboard changes soon)?

william-conti · 2023-10-18T17:56:27Z

src/databricks/labs/ucx/framework/crawlers.py

+
+    def deploy_table(self, name: str, klass: type):
+        logger.info(f"Ensuring {self._inventory_schema}.{name} table exists")
+        self._sql_backend.create_table(f"hive_metastore.{self._inventory_schema}.{name}", klass)


What happens there if we change/remove a column in a table ? I assume that this is handled by automatically by the save_table.

it is not handled yet - see #471

src/databricks/labs/ucx/framework/crawlers.py

william-conti · 2023-10-18T17:58:22Z

src/databricks/labs/ucx/framework/crawlers.py

+
+    def deploy_schema(self):
+        logger.info(f"Ensuring {self._inventory_schema} database exists")
+        self._sql_backend.execute(f"CREATE SCHEMA IF NOT EXISTS hive_metastore.{self._inventory_schema}")


Is it filtered by the assessment tool ? I recall that we do not have such thing

what do you mean?

william-conti · 2023-10-18T18:02:21Z

tests/integration/framework/test_crawlers.py

+    deployer = SchemaDeployer(sql_backend, inventory_schema, ucx)
+    deployer.deploy_schema()
+    deployer.deploy_table("grants", Grant)
+    deployer.deploy_view("grant_detail", "assessment/views/grant_detail.sql")


Add asserts there that we indeed added the tables/views

william-conti · 2023-10-18T18:10:04Z

src/databricks/labs/ucx/assessment/queries/README.md

+All files in this directory follow the virtual grid of a dashboard:
+
+* total width is 6 columns
+* all files are named as `<row>_<column>_something.sql`


So, do we deploy new widgets by using the file name and not by using the -- viz and -- widget line ? we are still using _parse_magic_comment to do so

for now, yes - later we can add more magic there :) I'll copy your comment to this readme

Ah, ok, yes, that makes sense. Answers my question earlier. +1

Conversely though I think we should stop using the filename in the query object ID, and maybe use a simple parser that removes the numbers, splits the rest on underscores and then title-cases the words. That way the names are more readable.

william-conti · 2023-10-18T18:20:56Z

src/databricks/labs/ucx/install.py

+        config: WorkspaceConfig,
+        *,
+        prefix="ucx",
+        override_clusters: dict[str, str] | None = None,


Why would we need this ? Looks like it's only useful for testing purposes, just override the cluster directly in the test instead ?

it's for a faster devloop - see 4453b4c & c297074

src/databricks/labs/ucx/assessment/queries/00_1_count_total_databases.sql

larsgeorge-db · 2023-10-20T06:10:13Z

src/databricks/labs/ucx/assessment/queries/02_0_database_summary.sql

+    END AS upgrade,
+    SUM(is_table) AS tables,
+    SUM(is_view) AS views,
+    SUM(is_unsupported) AS unsupported,


This is not the same as the columns= list above. I see unsupported missing there. I trust this is OK and works like this?

larsgeorge-db · 2023-10-20T06:12:52Z

src/databricks/labs/ucx/assessment/queries/README.md

+All files in this directory follow the virtual grid of a dashboard:
+
+* total width is 6 columns
+* all files are named as `<row>_<column>_something.sql`


Ah, ok, yes, that makes sense. Answers my question earlier. +1

larsgeorge-db · 2023-10-20T06:18:28Z

src/databricks/labs/ucx/assessment/queries/README.md

+All files in this directory follow the virtual grid of a dashboard:
+
+* total width is 6 columns
+* all files are named as `<row>_<column>_something.sql`


Conversely though I think we should stop using the filename in the query object ID, and maybe use a simple parser that removes the numbers, splits the rest on underscores and then title-cases the words. That way the names are more readable.

larsgeorge-db · 2023-10-20T06:19:27Z

src/databricks/labs/ucx/hive_metastore/grants.py

@@ -195,7 +194,7 @@ def _grants(
        view: str | None = None,
        any_file: bool = False,
        anonymous_function: bool = False,
-    ) -> Iterator[Grant]:
+    ) -> list[Grant]:


Great! I just found this as well in #477 - nice this is fixed.

codecov · 2023-11-16T20:17:55Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (9d1fc85) 80.65% compared to head (04c3497) 80.98%.
Report is 1 commits behind head on main.

Files	Patch %	Lines
src/databricks/labs/ucx/install.py	94.11%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     databrickslabs/ucx#474      +/-   ##
==========================================
+ Coverage   80.65%   80.98%   +0.32%     
==========================================
  Files          35       35              
  Lines        3583     3634      +51     
  Branches      696      698       +2     
==========================================
+ Hits         2890     2943      +53     
+ Misses        529      526       -3     
- Partials      164      165       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

**Breaking changes** (existing installations need to reinstall UCX and re-run assessment jobs) * Switched local group migration component to rename groups instead of creating backup groups ([#450](#450)). * Mitigate permissions loss in Table ACLs by folding grants belonging to the same principal, object id and object type together ([#512](#512)). **New features** * Added support for the experimental Databricks CLI launcher ([#517](#517)). * Added support for external Hive Metastores including AWS Glue ([#400](#400)). * Added more views to assessment dashboard ([#474](#474)). * Added rate limit for creating backup group to increase stability ([#500](#500)). * Added deduplication for mount point list ([#569](#569)). * Added documentation to describe interaction with external Hive Metastores ([#473](#473)). * Added failure injection for job failure message propagation ([#591](#591)). * Added uniqueness in the new warehouse name to avoid conflicts on installation ([#542](#542)). * Added a global init script to collect Hive Metastore lineage ([#513](#513)). * Added retry set/update permissions when possible and assess the changes in the workspace ([#519](#519)). * Use `~/.ucx/state.json` to store the state of both dashboards and jobs ([#561](#561)). **Bug fixes** * Fixed handling for `OWN` table permissions ([#571](#571)). * Fixed handling of keys with and without values. ([#514](#514)). * Fixed integration test failures related to concurrent group delete ([#584](#584)). * Fixed issue with workspace listing process on None type `object_type` ([#481](#481)). * Fixed missing group entitlement migration bug ([#583](#583)). * Fixed entitlement application for account-level groups ([#529](#529)). * Fixed assessment throwing an error when the owner of an object is empty ([#485](#485)). * Fixed installer to migrate between different configuration file versions ([#596](#596)). * Fixed cluster policy crawler to be aware of deleted policies ([#486](#486)). * Improved error message for not null constraints violated ([#532](#532)). * Improved integration test resiliency ([#597](#597), [#594](#594), [#586](#586)). * Introduced Safer access to workspace objects' properties. ([#530](#530)). * Mitigated permissions loss in Table ACLs by running appliers with single thread ([#518](#518)). * Running apply permission task before assessment should display message ([#487](#487)). * Split integration tests from blocking the merge queue ([#496](#496)). * Support more than one dashboard per step ([#472](#472)). * Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0 ([#505](#505)). * Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0 ([#575](#575)).

This PR has the following improvements for views and tables: - `assessment/queries` are renamed and refactored to more efficiently fit the grid - introduced `assessment/views` to hold common views that could be reused by multiple commands and widgets. - introduced `crawlers.SchemaDeployer`, that is (re-)creating views and tables **upon installation** (and not on the assessment step). - added `search_by` field for table widgets - improved parallelism for fetching grants - removed (now redundant) `setup_schema` and `setup_view` tasks from `assessment` workflow <img width="1981" alt="_UCX__serge_smertin__UCX_Assessment" src="https://github.com/databrickslabs/ucx/assets/259697/7addb0bb-b301-47ea-b351-9b75cd0a5d9d"> <img width="1985" alt="_UCX__serge_smertin__UCX_Assessment" src="https://github.com/databrickslabs/ucx/assets/259697/88effab7-59bb-46aa-af9b-bd9185d4f817">

**Breaking changes** (existing installations need to reinstall UCX and re-run assessment jobs) * Switched local group migration component to rename groups instead of creating backup groups ([#450](#450)). * Mitigate permissions loss in Table ACLs by folding grants belonging to the same principal, object id and object type together ([#512](#512)). **New features** * Added support for the experimental Databricks CLI launcher ([#517](#517)). * Added support for external Hive Metastores including AWS Glue ([#400](#400)). * Added more views to assessment dashboard ([#474](#474)). * Added rate limit for creating backup group to increase stability ([#500](#500)). * Added deduplication for mount point list ([#569](#569)). * Added documentation to describe interaction with external Hive Metastores ([#473](#473)). * Added failure injection for job failure message propagation ([#591](#591)). * Added uniqueness in the new warehouse name to avoid conflicts on installation ([#542](#542)). * Added a global init script to collect Hive Metastore lineage ([#513](#513)). * Added retry set/update permissions when possible and assess the changes in the workspace ([#519](#519)). * Use `~/.ucx/state.json` to store the state of both dashboards and jobs ([#561](#561)). **Bug fixes** * Fixed handling for `OWN` table permissions ([#571](#571)). * Fixed handling of keys with and without values. ([#514](#514)). * Fixed integration test failures related to concurrent group delete ([#584](#584)). * Fixed issue with workspace listing process on None type `object_type` ([#481](#481)). * Fixed missing group entitlement migration bug ([#583](#583)). * Fixed entitlement application for account-level groups ([#529](#529)). * Fixed assessment throwing an error when the owner of an object is empty ([#485](#485)). * Fixed installer to migrate between different configuration file versions ([#596](#596)). * Fixed cluster policy crawler to be aware of deleted policies ([#486](#486)). * Improved error message for not null constraints violated ([#532](#532)). * Improved integration test resiliency ([#597](#597), [#594](#594), [#586](#586)). * Introduced Safer access to workspace objects' properties. ([#530](#530)). * Mitigated permissions loss in Table ACLs by running appliers with single thread ([#518](#518)). * Running apply permission task before assessment should display message ([#487](#487)). * Split integration tests from blocking the merge queue ([#496](#496)). * Support more than one dashboard per step ([#472](#472)). * Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0 ([#505](#505)). * Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0 ([#575](#575)).

nfx requested a review from a team October 18, 2023 16:27

nfx had a problem deploying to account-admin October 18, 2023 16:28 — with GitHub Actions Failure

nfx had a problem deploying to account-admin October 18, 2023 16:33 — with GitHub Actions Failure

nfx had a problem deploying to account-admin October 18, 2023 17:08 — with GitHub Actions Failure

nfx self-assigned this Oct 18, 2023

nfx added the feat/installer install/upgrade the app label Oct 18, 2023

william-conti reviewed Oct 18, 2023

View reviewed changes

larsgeorge-db approved these changes Oct 20, 2023

View reviewed changes

nfx mentioned this pull request Oct 20, 2023

Get widget row/col from filename, if possible #479

Closed

larsgeorge-db mentioned this pull request Oct 21, 2023

Running apply permission task before assessment should display message #487

Merged

nfx mentioned this pull request Nov 5, 2023

Fixed Dashboard layout by putting summaries at top #551

Closed

nfx force-pushed the view/grants branch from 3b15da5 to 401a882 Compare November 16, 2023 20:00

..

1dbc682

nfx force-pushed the view/grants branch from 401a882 to 1dbc682 Compare November 16, 2023 20:14

nfx had a problem deploying to account-admin November 16, 2023 20:14 — with GitHub Actions Failure

..

4827bcd

nfx had a problem deploying to account-admin November 16, 2023 20:16 — with GitHub Actions Failure

..

04c3497

nfx had a problem deploying to account-admin November 16, 2023 20:17 — with GitHub Actions Failure

nfx merged commit 24d55ed into main Nov 16, 2023
5 of 6 checks passed

nfx deleted the view/grants branch November 16, 2023 20:19

nfx mentioned this pull request Nov 17, 2023

Release v0.6.0 #598

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added more views to assessment dashboard #474

Added more views to assessment dashboard #474

nfx commented Oct 18, 2023 •

edited

Loading

william-conti Oct 18, 2023

nfx Oct 18, 2023

william-conti Oct 18, 2023

larsgeorge-db Oct 20, 2023

william-conti Oct 18, 2023

nfx Oct 18, 2023

william-conti Oct 18, 2023

nfx Oct 18, 2023

william-conti Oct 18, 2023

william-conti Oct 18, 2023

nfx Oct 18, 2023

larsgeorge-db Oct 20, 2023

larsgeorge-db Oct 20, 2023

william-conti Oct 18, 2023

nfx Oct 18, 2023

larsgeorge-db Oct 20, 2023

larsgeorge-db Oct 20, 2023

larsgeorge-db Oct 20, 2023

larsgeorge-db Oct 20, 2023

codecov bot commented Nov 16, 2023 •

edited

Loading

		@@ -0,0 +1,4 @@
		-- viz type=counter, name=Storage Locations, counter_label=Storage Locations, value_column=count_total

Added more views to assessment dashboard #474

Added more views to assessment dashboard #474

Conversation

nfx commented Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 16, 2023 • edited Loading

Codecov Report

nfx commented Oct 18, 2023 •

edited

Loading

codecov bot commented Nov 16, 2023 •

edited

Loading