-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added more views to assessment dashboard #474
Conversation
@@ -0,0 +1,4 @@ | |||
-- viz type=counter, name=Storage Locations, counter_label=Storage Locations, value_column=count_total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name should be number of external locations and not Storage Locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea there that it's not a real external location, but can change that, of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This widget should have some sort of description then, or we should include the dashboard sketch in the README to avoid confusion with External Location and Storage Credentials
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong +1 on explaining the dashboards somewhere! Could be in a separate PR though (and I guess once we stabilize the dashboard changes soon)?
|
||
def deploy_table(self, name: str, klass: type): | ||
logger.info(f"Ensuring {self._inventory_schema}.{name} table exists") | ||
self._sql_backend.create_table(f"hive_metastore.{self._inventory_schema}.{name}", klass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens there if we change/remove a column in a table ? I assume that this is handled by automatically by the save_table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not handled yet - see #471
|
||
def deploy_schema(self): | ||
logger.info(f"Ensuring {self._inventory_schema} database exists") | ||
self._sql_backend.execute(f"CREATE SCHEMA IF NOT EXISTS hive_metastore.{self._inventory_schema}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it filtered by the assessment tool ? I recall that we do not have such thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
deployer = SchemaDeployer(sql_backend, inventory_schema, ucx) | ||
deployer.deploy_schema() | ||
deployer.deploy_table("grants", Grant) | ||
deployer.deploy_view("grant_detail", "assessment/views/grant_detail.sql") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add asserts there that we indeed added the tables/views
All files in this directory follow the virtual grid of a dashboard: | ||
|
||
* total width is 6 columns | ||
* all files are named as `<row>_<column>_something.sql` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do we deploy new widgets by using the file name and not by using the -- viz and -- widget line ? we are still using _parse_magic_comment to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, yes - later we can add more magic there :) I'll copy your comment to this readme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, yes, that makes sense. Answers my question earlier. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversely though I think we should stop using the filename in the query object ID, and maybe use a simple parser that removes the numbers, splits the rest on underscores and then title-cases the words. That way the names are more readable.
config: WorkspaceConfig, | ||
*, | ||
prefix="ucx", | ||
override_clusters: dict[str, str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we need this ? Looks like it's only useful for testing purposes, just override the cluster directly in the test instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/databricks/labs/ucx/assessment/queries/00_1_count_total_databases.sql
Outdated
Show resolved
Hide resolved
END AS upgrade, | ||
SUM(is_table) AS tables, | ||
SUM(is_view) AS views, | ||
SUM(is_unsupported) AS unsupported, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same as the columns=
list above. I see unsupported
missing there. I trust this is OK and works like this?
All files in this directory follow the virtual grid of a dashboard: | ||
|
||
* total width is 6 columns | ||
* all files are named as `<row>_<column>_something.sql` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, yes, that makes sense. Answers my question earlier. +1
All files in this directory follow the virtual grid of a dashboard: | ||
|
||
* total width is 6 columns | ||
* all files are named as `<row>_<column>_something.sql` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversely though I think we should stop using the filename in the query object ID, and maybe use a simple parser that removes the numbers, splits the rest on underscores and then title-cases the words. That way the names are more readable.
@@ -195,7 +194,7 @@ def _grants( | |||
view: str | None = None, | |||
any_file: bool = False, | |||
anonymous_function: bool = False, | |||
) -> Iterator[Grant]: | |||
) -> list[Grant]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I just found this as well in #477 - nice this is fixed.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main databrickslabs/ucx#474 +/- ##
==========================================
+ Coverage 80.65% 80.98% +0.32%
==========================================
Files 35 35
Lines 3583 3634 +51
Branches 696 698 +2
==========================================
+ Hits 2890 2943 +53
+ Misses 529 526 -3
- Partials 164 165 +1 ☔ View full report in Codecov by Sentry. |
**Breaking changes** (existing installations need to reinstall UCX and re-run assessment jobs) * Switched local group migration component to rename groups instead of creating backup groups ([#450](#450)). * Mitigate permissions loss in Table ACLs by folding grants belonging to the same principal, object id and object type together ([#512](#512)). **New features** * Added support for the experimental Databricks CLI launcher ([#517](#517)). * Added support for external Hive Metastores including AWS Glue ([#400](#400)). * Added more views to assessment dashboard ([#474](#474)). * Added rate limit for creating backup group to increase stability ([#500](#500)). * Added deduplication for mount point list ([#569](#569)). * Added documentation to describe interaction with external Hive Metastores ([#473](#473)). * Added failure injection for job failure message propagation ([#591](#591)). * Added uniqueness in the new warehouse name to avoid conflicts on installation ([#542](#542)). * Added a global init script to collect Hive Metastore lineage ([#513](#513)). * Added retry set/update permissions when possible and assess the changes in the workspace ([#519](#519)). * Use `~/.ucx/state.json` to store the state of both dashboards and jobs ([#561](#561)). **Bug fixes** * Fixed handling for `OWN` table permissions ([#571](#571)). * Fixed handling of keys with and without values. ([#514](#514)). * Fixed integration test failures related to concurrent group delete ([#584](#584)). * Fixed issue with workspace listing process on None type `object_type` ([#481](#481)). * Fixed missing group entitlement migration bug ([#583](#583)). * Fixed entitlement application for account-level groups ([#529](#529)). * Fixed assessment throwing an error when the owner of an object is empty ([#485](#485)). * Fixed installer to migrate between different configuration file versions ([#596](#596)). * Fixed cluster policy crawler to be aware of deleted policies ([#486](#486)). * Improved error message for not null constraints violated ([#532](#532)). * Improved integration test resiliency ([#597](#597), [#594](#594), [#586](#586)). * Introduced Safer access to workspace objects' properties. ([#530](#530)). * Mitigated permissions loss in Table ACLs by running appliers with single thread ([#518](#518)). * Running apply permission task before assessment should display message ([#487](#487)). * Split integration tests from blocking the merge queue ([#496](#496)). * Support more than one dashboard per step ([#472](#472)). * Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0 ([#505](#505)). * Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0 ([#575](#575)).
**Breaking changes** (existing installations need to reinstall UCX and re-run assessment jobs) * Switched local group migration component to rename groups instead of creating backup groups ([#450](#450)). * Mitigate permissions loss in Table ACLs by folding grants belonging to the same principal, object id and object type together ([#512](#512)). **New features** * Added support for the experimental Databricks CLI launcher ([#517](#517)). * Added support for external Hive Metastores including AWS Glue ([#400](#400)). * Added more views to assessment dashboard ([#474](#474)). * Added rate limit for creating backup group to increase stability ([#500](#500)). * Added deduplication for mount point list ([#569](#569)). * Added documentation to describe interaction with external Hive Metastores ([#473](#473)). * Added failure injection for job failure message propagation ([#591](#591)). * Added uniqueness in the new warehouse name to avoid conflicts on installation ([#542](#542)). * Added a global init script to collect Hive Metastore lineage ([#513](#513)). * Added retry set/update permissions when possible and assess the changes in the workspace ([#519](#519)). * Use `~/.ucx/state.json` to store the state of both dashboards and jobs ([#561](#561)). **Bug fixes** * Fixed handling for `OWN` table permissions ([#571](#571)). * Fixed handling of keys with and without values. ([#514](#514)). * Fixed integration test failures related to concurrent group delete ([#584](#584)). * Fixed issue with workspace listing process on None type `object_type` ([#481](#481)). * Fixed missing group entitlement migration bug ([#583](#583)). * Fixed entitlement application for account-level groups ([#529](#529)). * Fixed assessment throwing an error when the owner of an object is empty ([#485](#485)). * Fixed installer to migrate between different configuration file versions ([#596](#596)). * Fixed cluster policy crawler to be aware of deleted policies ([#486](#486)). * Improved error message for not null constraints violated ([#532](#532)). * Improved integration test resiliency ([#597](#597), [#594](#594), [#586](#586)). * Introduced Safer access to workspace objects' properties. ([#530](#530)). * Mitigated permissions loss in Table ACLs by running appliers with single thread ([#518](#518)). * Running apply permission task before assessment should display message ([#487](#487)). * Split integration tests from blocking the merge queue ([#496](#496)). * Support more than one dashboard per step ([#472](#472)). * Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0 ([#505](#505)). * Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0 ([#575](#575)).
This PR has the following improvements for views and tables: - `assessment/queries` are renamed and refactored to more efficiently fit the grid - introduced `assessment/views` to hold common views that could be reused by multiple commands and widgets. - introduced `crawlers.SchemaDeployer`, that is (re-)creating views and tables **upon installation** (and not on the assessment step). - added `search_by` field for table widgets - improved parallelism for fetching grants - removed (now redundant) `setup_schema` and `setup_view` tasks from `assessment` workflow <img width="1981" alt="_UCX__serge_smertin__UCX_Assessment" src="https://github.com/databrickslabs/ucx/assets/259697/7addb0bb-b301-47ea-b351-9b75cd0a5d9d"> <img width="1985" alt="_UCX__serge_smertin__UCX_Assessment" src="https://github.com/databrickslabs/ucx/assets/259697/88effab7-59bb-46aa-af9b-bd9185d4f817">
**Breaking changes** (existing installations need to reinstall UCX and re-run assessment jobs) * Switched local group migration component to rename groups instead of creating backup groups ([#450](#450)). * Mitigate permissions loss in Table ACLs by folding grants belonging to the same principal, object id and object type together ([#512](#512)). **New features** * Added support for the experimental Databricks CLI launcher ([#517](#517)). * Added support for external Hive Metastores including AWS Glue ([#400](#400)). * Added more views to assessment dashboard ([#474](#474)). * Added rate limit for creating backup group to increase stability ([#500](#500)). * Added deduplication for mount point list ([#569](#569)). * Added documentation to describe interaction with external Hive Metastores ([#473](#473)). * Added failure injection for job failure message propagation ([#591](#591)). * Added uniqueness in the new warehouse name to avoid conflicts on installation ([#542](#542)). * Added a global init script to collect Hive Metastore lineage ([#513](#513)). * Added retry set/update permissions when possible and assess the changes in the workspace ([#519](#519)). * Use `~/.ucx/state.json` to store the state of both dashboards and jobs ([#561](#561)). **Bug fixes** * Fixed handling for `OWN` table permissions ([#571](#571)). * Fixed handling of keys with and without values. ([#514](#514)). * Fixed integration test failures related to concurrent group delete ([#584](#584)). * Fixed issue with workspace listing process on None type `object_type` ([#481](#481)). * Fixed missing group entitlement migration bug ([#583](#583)). * Fixed entitlement application for account-level groups ([#529](#529)). * Fixed assessment throwing an error when the owner of an object is empty ([#485](#485)). * Fixed installer to migrate between different configuration file versions ([#596](#596)). * Fixed cluster policy crawler to be aware of deleted policies ([#486](#486)). * Improved error message for not null constraints violated ([#532](#532)). * Improved integration test resiliency ([#597](#597), [#594](#594), [#586](#586)). * Introduced Safer access to workspace objects' properties. ([#530](#530)). * Mitigated permissions loss in Table ACLs by running appliers with single thread ([#518](#518)). * Running apply permission task before assessment should display message ([#487](#487)). * Split integration tests from blocking the merge queue ([#496](#496)). * Support more than one dashboard per step ([#472](#472)). * Update databricks-sdk requirement from ~=0.11.0 to ~=0.12.0 ([#505](#505)). * Update databricks-sdk requirement from ~=0.12.0 to ~=0.13.0 ([#575](#575)).
This PR has the following improvements for views and tables:
assessment/queries
are renamed and refactored to more efficiently fit the gridassessment/views
to hold common views that could be reused by multiple commands and widgets.crawlers.SchemaDeployer
, that is (re-)creating views and tables upon installation (and not on the assessment step).search_by
field for table widgetssetup_schema
andsetup_view
tasks fromassessment
workflow