Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added more views to assessment dashboard #474

Merged
merged 3 commits into from
Nov 16, 2023
Merged

Added more views to assessment dashboard #474

merged 3 commits into from
Nov 16, 2023

Conversation

nfx
Copy link
Collaborator

@nfx nfx commented Oct 18, 2023

This PR has the following improvements for views and tables:

  • assessment/queries are renamed and refactored to more efficiently fit the grid
  • introduced assessment/views to hold common views that could be reused by multiple commands and widgets.
  • introduced crawlers.SchemaDeployer, that is (re-)creating views and tables upon installation (and not on the assessment step).
  • added search_by field for table widgets
  • improved parallelism for fetching grants
  • removed (now redundant) setup_schema and setup_view tasks from assessment workflow
_UCX__serge_smertin__UCX_Assessment _UCX__serge_smertin__UCX_Assessment

@nfx nfx requested a review from a team October 18, 2023 16:27
@nfx nfx self-assigned this Oct 18, 2023
@nfx nfx added the feat/installer install/upgrade the app label Oct 18, 2023
@@ -0,0 +1,4 @@
-- viz type=counter, name=Storage Locations, counter_label=Storage Locations, value_column=count_total
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name should be number of external locations and not Storage Locations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea there that it's not a real external location, but can change that, of course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This widget should have some sort of description then, or we should include the dashboard sketch in the README to avoid confusion with External Location and Storage Credentials

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong +1 on explaining the dashboards somewhere! Could be in a separate PR though (and I guess once we stabilize the dashboard changes soon)?


def deploy_table(self, name: str, klass: type):
logger.info(f"Ensuring {self._inventory_schema}.{name} table exists")
self._sql_backend.create_table(f"hive_metastore.{self._inventory_schema}.{name}", klass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens there if we change/remove a column in a table ? I assume that this is handled by automatically by the save_table.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not handled yet - see #471

src/databricks/labs/ucx/framework/crawlers.py Show resolved Hide resolved

def deploy_schema(self):
logger.info(f"Ensuring {self._inventory_schema} database exists")
self._sql_backend.execute(f"CREATE SCHEMA IF NOT EXISTS hive_metastore.{self._inventory_schema}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it filtered by the assessment tool ? I recall that we do not have such thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

deployer = SchemaDeployer(sql_backend, inventory_schema, ucx)
deployer.deploy_schema()
deployer.deploy_table("grants", Grant)
deployer.deploy_view("grant_detail", "assessment/views/grant_detail.sql")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add asserts there that we indeed added the tables/views

All files in this directory follow the virtual grid of a dashboard:

* total width is 6 columns
* all files are named as `<row>_<column>_something.sql`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do we deploy new widgets by using the file name and not by using the -- viz and -- widget line ? we are still using _parse_magic_comment to do so

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, yes - later we can add more magic there :) I'll copy your comment to this readme

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, yes, that makes sense. Answers my question earlier. +1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversely though I think we should stop using the filename in the query object ID, and maybe use a simple parser that removes the numbers, splits the rest on underscores and then title-cases the words. That way the names are more readable.

config: WorkspaceConfig,
*,
prefix="ucx",
override_clusters: dict[str, str] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need this ? Looks like it's only useful for testing purposes, just override the cluster directly in the test instead ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's for a faster devloop - see 4453b4c & c297074

END AS upgrade,
SUM(is_table) AS tables,
SUM(is_view) AS views,
SUM(is_unsupported) AS unsupported,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same as the columns= list above. I see unsupported missing there. I trust this is OK and works like this?

All files in this directory follow the virtual grid of a dashboard:

* total width is 6 columns
* all files are named as `<row>_<column>_something.sql`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, yes, that makes sense. Answers my question earlier. +1

All files in this directory follow the virtual grid of a dashboard:

* total width is 6 columns
* all files are named as `<row>_<column>_something.sql`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversely though I think we should stop using the filename in the query object ID, and maybe use a simple parser that removes the numbers, splits the rest on underscores and then title-cases the words. That way the names are more readable.

@@ -195,7 +194,7 @@ def _grants(
view: str | None = None,
any_file: bool = False,
anonymous_function: bool = False,
) -> Iterator[Grant]:
) -> list[Grant]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I just found this as well in #477 - nice this is fixed.

Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (9d1fc85) 80.65% compared to head (04c3497) 80.98%.
Report is 1 commits behind head on main.

Files Patch % Lines
src/databricks/labs/ucx/install.py 94.11% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     databrickslabs/ucx#474      +/-   ##
==========================================
+ Coverage   80.65%   80.98%   +0.32%     
==========================================
  Files          35       35              
  Lines        3583     3634      +51     
  Branches      696      698       +2     
==========================================
+ Hits         2890     2943      +53     
+ Misses        529      526       -3     
- Partials      164      165       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nfx nfx merged commit 24d55ed into main Nov 16, 2023
5 of 6 checks passed
@nfx nfx deleted the view/grants branch November 16, 2023 20:19
nfx added a commit that referenced this pull request Nov 17, 2023
**Breaking changes** (existing installations need to reinstall UCX and re-run assessment jobs)

 * Switched local group migration component to rename groups instead of creating backup groups ([#450](#450)).
 * Mitigate permissions loss in Table ACLs by folding grants belonging to the same principal, object id and object type together ([#512](#512)).

**New features**

 * Added support for the experimental Databricks CLI launcher ([#517](#517)).
 * Added support for external Hive Metastores including AWS Glue ([#400](#400)).
 * Added more views to assessment dashboard ([#474](#474)).
 * Added rate limit for creating backup group to increase stability ([#500](#500)).
 * Added deduplication for mount point list ([#569](#569)).
 * Added documentation to describe interaction with external Hive Metastores ([#473](#473)).
 * Added failure injection for job failure message propagation  ([#591](#591)).
 * Added uniqueness in the new warehouse name to avoid conflicts on installation ([#542](#542)).
 * Added a global init script to collect Hive Metastore lineage ([#513](#513)).
 * Added retry set/update permissions when possible and assess the changes in the workspace ([#519](#519)).
 * Use `~/.ucx/state.json` to store the state of both dashboards and jobs ([#561](#561)).

**Bug fixes**

 * Fixed handling for `OWN` table permissions ([#571](#571)).
 * Fixed handling of keys with and without values. ([#514](#514)).
 * Fixed integration test failures related to concurrent group delete ([#584](#584)).
 * Fixed issue with workspace listing process on None type `object_type` ([#481](#481)).
 * Fixed missing group entitlement migration bug ([#583](#583)).
 * Fixed entitlement application for account-level groups ([#529](#529)).
 * Fixed assessment throwing an error when the owner of an object is empty ([#485](#485)).
 * Fixed installer to migrate between different configuration file versions ([#596](#596)).
 * Fixed cluster policy crawler to be aware of deleted policies ([#486](#486)).
 * Improved error message for not null constraints violated ([#532](#532)).
 * Improved integration test resiliency ([#597](#597), [#594](#594), [#586](#586)).
 * Introduced Safer access to workspace objects' properties. ([#530](#530)).
 * Mitigated permissions loss in Table ACLs by running appliers with single thread ([#518](#518)).
 * Running apply permission task before assessment should display message ([#487](#487)).
 * Split integration tests from blocking the merge queue ([#496](#496)).
 * Support more than one dashboard per step ([#472](#472)).
 * Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0 ([#505](#505)).
 * Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0 ([#575](#575)).
@nfx nfx mentioned this pull request Nov 17, 2023
nfx added a commit that referenced this pull request Nov 17, 2023
**Breaking changes** (existing installations need to reinstall UCX and
re-run assessment jobs)

* Switched local group migration component to rename groups instead of
creating backup groups
([#450](#450)).
* Mitigate permissions loss in Table ACLs by folding grants belonging to
the same principal, object id and object type together
([#512](#512)).

**New features**

* Added support for the experimental Databricks CLI launcher
([#517](#517)).
* Added support for external Hive Metastores including AWS Glue
([#400](#400)).
* Added more views to assessment dashboard
([#474](#474)).
* Added rate limit for creating backup group to increase stability
([#500](#500)).
* Added deduplication for mount point list
([#569](#569)).
* Added documentation to describe interaction with external Hive
Metastores ([#473](#473)).
* Added failure injection for job failure message propagation
([#591](#591)).
* Added uniqueness in the new warehouse name to avoid conflicts on
installation ([#542](#542)).
* Added a global init script to collect Hive Metastore lineage
([#513](#513)).
* Added retry set/update permissions when possible and assess the
changes in the workspace
([#519](#519)).
* Use `~/.ucx/state.json` to store the state of both dashboards and jobs
([#561](#561)).

**Bug fixes**

* Fixed handling for `OWN` table permissions
([#571](#571)).
* Fixed handling of keys with and without values.
([#514](#514)).
* Fixed integration test failures related to concurrent group delete
([#584](#584)).
* Fixed issue with workspace listing process on None type `object_type`
([#481](#481)).
* Fixed missing group entitlement migration bug
([#583](#583)).
* Fixed entitlement application for account-level groups
([#529](#529)).
* Fixed assessment throwing an error when the owner of an object is
empty ([#485](#485)).
* Fixed installer to migrate between different configuration file
versions ([#596](#596)).
* Fixed cluster policy crawler to be aware of deleted policies
([#486](#486)).
* Improved error message for not null constraints violated
([#532](#532)).
* Improved integration test resiliency
([#597](#597),
[#594](#594),
[#586](#586)).
* Introduced Safer access to workspace objects' properties.
([#530](#530)).
* Mitigated permissions loss in Table ACLs by running appliers with
single thread ([#518](#518)).
* Running apply permission task before assessment should display message
([#487](#487)).
* Split integration tests from blocking the merge queue
([#496](#496)).
* Support more than one dashboard per step
([#472](#472)).
* Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0
([#505](#505)).
* Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0
([#575](#575)).
pritishpai pushed a commit that referenced this pull request Nov 21, 2023
This PR has the following improvements for views and tables:

- `assessment/queries` are renamed and refactored to more efficiently
fit the grid
- introduced `assessment/views` to hold common views that could be
reused by multiple commands and widgets.
- introduced `crawlers.SchemaDeployer`, that is (re-)creating views and
tables **upon installation** (and not on the assessment step).
- added `search_by` field for table widgets
- improved parallelism for fetching grants
- removed (now redundant) `setup_schema` and `setup_view` tasks from
`assessment` workflow
<img width="1981" alt="_UCX__serge_smertin__UCX_Assessment"
src="https://github.com/databrickslabs/ucx/assets/259697/7addb0bb-b301-47ea-b351-9b75cd0a5d9d">

<img width="1985" alt="_UCX__serge_smertin__UCX_Assessment"
src="https://github.com/databrickslabs/ucx/assets/259697/88effab7-59bb-46aa-af9b-bd9185d4f817">
pritishpai pushed a commit that referenced this pull request Nov 21, 2023
**Breaking changes** (existing installations need to reinstall UCX and
re-run assessment jobs)

* Switched local group migration component to rename groups instead of
creating backup groups
([#450](#450)).
* Mitigate permissions loss in Table ACLs by folding grants belonging to
the same principal, object id and object type together
([#512](#512)).

**New features**

* Added support for the experimental Databricks CLI launcher
([#517](#517)).
* Added support for external Hive Metastores including AWS Glue
([#400](#400)).
* Added more views to assessment dashboard
([#474](#474)).
* Added rate limit for creating backup group to increase stability
([#500](#500)).
* Added deduplication for mount point list
([#569](#569)).
* Added documentation to describe interaction with external Hive
Metastores ([#473](#473)).
* Added failure injection for job failure message propagation
([#591](#591)).
* Added uniqueness in the new warehouse name to avoid conflicts on
installation ([#542](#542)).
* Added a global init script to collect Hive Metastore lineage
([#513](#513)).
* Added retry set/update permissions when possible and assess the
changes in the workspace
([#519](#519)).
* Use `~/.ucx/state.json` to store the state of both dashboards and jobs
([#561](#561)).

**Bug fixes**

* Fixed handling for `OWN` table permissions
([#571](#571)).
* Fixed handling of keys with and without values.
([#514](#514)).
* Fixed integration test failures related to concurrent group delete
([#584](#584)).
* Fixed issue with workspace listing process on None type `object_type`
([#481](#481)).
* Fixed missing group entitlement migration bug
([#583](#583)).
* Fixed entitlement application for account-level groups
([#529](#529)).
* Fixed assessment throwing an error when the owner of an object is
empty ([#485](#485)).
* Fixed installer to migrate between different configuration file
versions ([#596](#596)).
* Fixed cluster policy crawler to be aware of deleted policies
([#486](#486)).
* Improved error message for not null constraints violated
([#532](#532)).
* Improved integration test resiliency
([#597](#597),
[#594](#594),
[#586](#586)).
* Introduced Safer access to workspace objects' properties.
([#530](#530)).
* Mitigated permissions loss in Table ACLs by running appliers with
single thread ([#518](#518)).
* Running apply permission task before assessment should display message
([#487](#487)).
* Split integration tests from blocking the merge queue
([#496](#496)).
* Support more than one dashboard per step
([#472](#472)).
* Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0
([#505](#505)).
* Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0
([#575](#575)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/installer install/upgrade the app
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants