Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added export CLI functionality for assessment results #2553

Merged
merged 93 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
4f0c6dc
initial draft of remote dashboads widgets collection
rportilla-databricks Aug 26, 2024
d288e0a
Creating new branch without committing any files
rportilla-databricks Aug 27, 2024
5aee6b6
Cherry-pick commit 60f957e
rportilla-databricks Aug 27, 2024
afc714c
test
lmallepaddi Aug 27, 2024
ad01abb
test commit
lmallepaddi Aug 27, 2024
f26c522
Added UCX Export Result notebook as Utility Hack
jgarciaf106 Aug 29, 2024
522d43e
add test files
rportilla-databricks Aug 30, 2024
0c7083d
updated unit tests
rportilla-databricks Sep 3, 2024
66dbab1
Merge pull request #1 from rportilla-databricks/feat/ucx-export-notebook
rportilla-databricks Sep 3, 2024
32adee0
Update workflows.py
jgarciaf106 Sep 3, 2024
77a88cd
Merge pull request #3 from rportilla-databricks/feat/add_exporter
rportilla-databricks Sep 3, 2024
9db2d70
Merge pull request #2 from rportilla-databricks/export-notebook-patch-1
rportilla-databricks Sep 4, 2024
33f1f11
Adding unit tests coverage
lmallepaddi Sep 4, 2024
898eeef
fullset unit test cases and slight changes to export.py file
lmallepaddi Sep 5, 2024
2c8e92b
Removed unnecessary imports
lmallepaddi Sep 5, 2024
d6243ad
Merge pull request #4 from rportilla-databricks/feat/add_exporter
rportilla-databricks Sep 6, 2024
99bf9a1
updated tests
rportilla-databricks Sep 6, 2024
ee446f1
update cli for export
rportilla-databricks Sep 7, 2024
e57cbfe
update test format
rportilla-databricks Sep 7, 2024
1240c63
Merge branch 'main' into main
rportilla-databricks Sep 7, 2024
5cdeb3c
Update src/databricks/labs/ucx/assessment/export.py
rportilla-databricks Sep 9, 2024
c63904e
add structural changes
rportilla-databricks Sep 10, 2024
8b71454
Integrated lsql functionality to export the Assessment Results
jgarciaf106 Sep 11, 2024
a8a325e
Minor Changes
jgarciaf106 Sep 11, 2024
6ebc90c
Merge branch 'main' into feat/cli-assessment-export
rportilla-databricks Sep 11, 2024
f47f125
Merge pull request #7 from jgarciaf106/feat/cli-assessment-export
rportilla-databricks Sep 11, 2024
cd67ee4
naming change missed in cli.py
rportilla-databricks Sep 11, 2024
3e8d342
correcting use of the cached property
rportilla-databricks Sep 11, 2024
92fd1c0
fix cached property
rportilla-databricks Sep 11, 2024
e5e2e10
fix cli
rportilla-databricks Sep 11, 2024
a71b086
Removed flags as there are code provides prompts
lmallepaddi Sep 11, 2024
21785e4
added command line usage in readme file
lmallepaddi Sep 12, 2024
077a03f
Merge pull request #8 from rportilla-databricks/feat/add_exporter
rportilla-databricks Sep 12, 2024
14fa408
minor change to export.py to resolve unit test case issue
lmallepaddi Sep 12, 2024
8d607c8
fix errors and address feedback
rportilla-databricks Sep 12, 2024
20c19f2
updated syntax for export
rportilla-databricks Sep 12, 2024
cf6fa3c
Merge branch 'main' into feat/add_exporter
lmallepaddi Sep 12, 2024
6a33aca
Update src/databricks/labs/ucx/cli.py
rportilla-databricks Sep 12, 2024
7b798aa
Update src/databricks/labs/ucx/installer/workflows.py
rportilla-databricks Sep 12, 2024
66c281f
Update workflows.py
jgarciaf106 Sep 12, 2024
a00ecf8
Merge pull request #10 from rportilla-databricks/Workflows-patch-3
rportilla-databricks Sep 12, 2024
92937ce
revert tmp path
rportilla-databricks Sep 12, 2024
b4d58d5
changes to README.md and test_export.py
lmallepaddi Sep 12, 2024
4599dfa
Merge pull request #9 from rportilla-databricks/feat/add_exporter
rportilla-databricks Sep 15, 2024
dce6f86
Merge branch 'databrickslabs:main' into feat/add_exporter
rportilla-databricks Sep 15, 2024
d61a5ce
Merge branch 'databrickslabs:main' into main
rportilla-databricks Sep 15, 2024
8c7dd67
Merge pull request #11 from rportilla-databricks/feat/add_exporter
rportilla-databricks Sep 15, 2024
d94d581
update the logical name of the collection of queries
rportilla-databricks Sep 16, 2024
007891d
update reference
rportilla-databricks Sep 16, 2024
598c596
Updated notebook content to Use triple quotes
jgarciaf106 Sep 16, 2024
82146db
Merge pull request #12 from rportilla-databricks/export_notebook/patch
rportilla-databricks Sep 16, 2024
f4b7170
zip file rename functionality
lmallepaddi Sep 17, 2024
90a8189
changed readme file to reflect changes to zipfilename
lmallepaddi Sep 17, 2024
5eb417c
Changes to export.py using lsql version 0.11
lmallepaddi Sep 19, 2024
9069f3d
updated the test_cli.py for filename change
lmallepaddi Sep 19, 2024
44b4ac4
remove print statement
rportilla-databricks Sep 23, 2024
bcfd6e1
Merge pull request #13 from rportilla-databricks/feat/add_exporter
rportilla-databricks Sep 23, 2024
f90d2c1
Merge branch 'main' into main
nfx Sep 23, 2024
6e3ddf9
Export Results simplification
jgarciaf106 Sep 24, 2024
26799fc
Merge pull request #14 from rportilla-databricks/patch-export
rportilla-databricks Sep 25, 2024
b2e3c85
Fixed Unit Tests for Export Assessment
jgarciaf106 Sep 25, 2024
9fb3f49
Merge pull request #15 from rportilla-databricks/patch-export
rportilla-databricks Sep 25, 2024
3d567b4
Merge branch 'databrickslabs:main' into main
rportilla-databricks Sep 26, 2024
ae5832d
make fmt run
rportilla-databricks Sep 26, 2024
204ab4a
Minor contributor documentation changes (#2729)
asnare Sep 24, 2024
0a6e72f
Handle `PermissionDenied` when listing accessible workspaces (#2733)
JCZuurmond Sep 24, 2024
93d496e
Adding unskip CLI command to undo a skip on schema or a table (#2727)
aminmovahed-db Sep 24, 2024
ee20112
Fix failing integration tests that perform a real assessment (#2736)
ericvergnaud Sep 24, 2024
2f62a0f
Update documentation to explain the usage of collections and eligible…
HariGS-DB Sep 24, 2024
8fee9d8
Enables cli cmd `databricks labs ucx create-catalog-schemas` to apply…
HariGS-DB Sep 24, 2024
21da7cc
Add `create-ucx-catalog` cli command (#2694)
JCZuurmond Sep 25, 2024
2a09a8f
Fixes issue of circular dependency of migrate-location ACL (#2741)
HariGS-DB Sep 25, 2024
d1dd0c5
Added static code analysis results to assessment dashboard (#2696)
ericvergnaud Sep 25, 2024
7cba9b0
Increases test coverage (#2739)
pritishpai Sep 25, 2024
04a4956
Fixes source table alias dissapearance during migrate_views (#2726)
pritishpai Sep 25, 2024
15c5536
Bump astroid version, pylint version and drop our f-string workaround…
ericvergnaud Sep 25, 2024
2611b11
Update databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8…
dependabot[bot] Sep 25, 2024
555e83a
Delete temporary files when running solacc (#2750)
ericvergnaud Sep 25, 2024
d744465
Code format: `make fmt` (#2749)
asnare Sep 25, 2024
04918be
Speedup assessment workflow by making DBFS root table size calculatio…
nfx Sep 25, 2024
0a03def
Harden configuration reading (#2701)
JCZuurmond Sep 26, 2024
be955ea
Add unskip CLI command to undo a skip on schema or a table (#2734)
aminmovahed-db Sep 26, 2024
788c273
Improve solacc linting (#2752)
ericvergnaud Sep 26, 2024
1d12391
Sync Fork, make fmt test
jgarciaf106 Sep 26, 2024
d3a0e39
Merge pull request #16 from rportilla-databricks/patch-export-v2
rportilla-databricks Sep 27, 2024
4d2e9a6
Merge branch 'databrickslabs:main' into main
rportilla-databricks Sep 27, 2024
b163801
Added CLI Functionality to export UCX Assessment
jgarciaf106 Sep 30, 2024
188bc24
Added CLI Functionality to export UCX Assessment
jgarciaf106 Sep 30, 2024
cb89541
Added CLI Functionality to export UCX Assessment
jgarciaf106 Sep 30, 2024
571fc8b
Added CLI Functionality to export UCX Assessment
jgarciaf106 Sep 30, 2024
77786a2
Merge branch 'main' into feat/add-cli-export-assessment-reviewed
rportilla-databricks Oct 2, 2024
84830fd
Merge pull request #21 from jgarciaf106/feat/add-cli-export-assessmen…
rportilla-databricks Oct 2, 2024
2975950
Merge branch 'main' into main
jgarciaf106 Oct 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@ so that you'll be able to [scope the migration](docs/assessment.md) and execute
The [README notebook](#readme-notebook), which can be found in the installation folder contains further instructions and explanations of the different ucx workflows & dashboards.
Once the migration is scoped, you can start with the [table migration process](#Table-Migration).


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant newlines

More workflows, like notebook code migration are coming in future releases.


UCX also provides a number of command line utilities accessible via `databricks labs ucx`.

For questions, troubleshooting or bug fixes, please see our [troubleshooting guide](docs/troubleshooting.md) or submit [an issue](https://github.com/databrickslabs/ucx/issues).
Expand Down
2 changes: 2 additions & 0 deletions labs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -275,3 +275,5 @@ commands:
- name: target-workspace-id
description: (Optional) id of a workspace in the target collection. If not specified, ucx will prompt to select from a list

- name: export
rportilla-databricks marked this conversation as resolved.
Show resolved Hide resolved
description: export widget data from the assessment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the command to accept a second argument what to indicate what is exported:

databricks labs ucx export --what assessment
Suggested change
description: export widget data from the assessment
description: export ucx data

@nfx: please provide your input on this API

116 changes: 116 additions & 0 deletions src/databricks/labs/ucx/assessment/export.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
import os
import re
import csv
import logging
from pathlib import Path
from zipfile import ZipFile
from concurrent.futures import ThreadPoolExecutor
from databricks.labs.blueprint.tui import Prompts
from databricks.labs.ucx.contexts.workspace_cli import WorkspaceContext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isort


logger = logging.getLogger(__name__)


class AssessmentExporter:
# File and Path Constants
_ZIP_FILE_NAME = "ucx_assessment_results.zip"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ZIP_FILE_NAME = "ucx_assessment_results.zip"
_EXPORT_FILE_NAME = "ucx_assessment_results.zip"


def __init__(self, ctx: WorkspaceContext):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, ctx: WorkspaceContext):
def __init__(self, sql_backend: SqlBackend, config: WorkspaceConfig):

it's an anti-pattern to depend on the entire WorkspaceContext. depend only on what is used.

self._ctx = ctx

def _get_ucx_main_queries(self) -> list[dict[str, str]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use existing QueryTile abstraction

"""Retrieve and construct the main UCX queries."""
pattern = r"\b.inventory\b"
schema = self._ctx.inventory_database
project_root = Path(__file__).parent.parent.parent.parent
ucx_main_queries_path = project_root / "labs/ucx/queries/assessment/main"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a way concise version of this class that does the same thing, but with portability to become part of https://github.com/databrickslabs/lsql. You can PR it over there first and call it from UCX after, if you'd like. This way we can export any dashboards-as-code into CSV for any project.

from databricks.labs.lsql.dashboards import DashboardMetadata

dashboard = DashboardMetadata.from_path(ucx_main_queries_path)
dashboard = dashboard.replace_database(catalog='hive_metastore', database=self._config.inventory_database)
for tile in dashboard.tiles:
   if not tile.is_query():
      continue
   file_name = f"{tile.id}.csv"
   for row in self._sql_backend.fetch(tile.content):
      _ = row.as_dict()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP


# List all SQL files in the directory, excluding those with 'count' in their names
sql_files = [file for file in ucx_main_queries_path.iterdir() if file.suffix == ".sql"]

ucx_main_queries = []

for sql_file in sql_files:
content = sql_file.read_text()
modified_content = re.sub(pattern, f" {schema}", content, flags=re.IGNORECASE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
modified_content = re.sub(pattern, f" {schema}", content, flags=re.IGNORECASE)
modified_content = self._config.replace_inventory_variable(content)

use databricks.labs.ucx.config.WorkspaceConfig.replace_inventory_variable that is already for this purpose

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comment, this method may be written in more maintainable way

query_name = sql_file.stem
ucx_main_queries.append({"name": query_name, "query": modified_content})

return ucx_main_queries

@staticmethod
def _extract_target_name(name: str, pattern: str) -> str:
"""Extract target name from the file name using the provided pattern."""
match = re.search(pattern, name)
return match.group(1) if match else ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not used in production code.


@staticmethod
def _cleanup(path: Path, target_name: str) -> None:
"""Remove a specific CSV file in the given path that matches the target name."""
target_file = path.joinpath(target_name)

if target_file.exists():
target_file.unlink()

def _execute_query(self, path: Path, result: dict[str, str]) -> None:
"""Execute a SQL query and write the result to a CSV file."""
pattern = r"^\d+_\d+_(.*)"
match = re.search(pattern, result["name"])
if match:
file_name = f"{match.group(1)}.csv"
csv_path = os.path.join(path, file_name)

query_results = list(self._ctx.sql_backend.fetch(result["query"]))

if query_results:
headers = query_results[0].asDict().keys()
with open(csv_path, mode='w', newline='', encoding='utf-8') as file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
csv_path = os.path.join(path, file_name)
query_results = list(self._ctx.sql_backend.fetch(result["query"]))
if query_results:
headers = query_results[0].asDict().keys()
with open(csv_path, mode='w', newline='', encoding='utf-8') as file:
query_results = list(self._ctx.sql_backend.fetch(result["query"]))
if query_results:
headers = query_results[0].asDict().keys()
with (path / file_name).open(mode='w', newline='', encoding='utf-8') as file:

consistently use pathlib.Path everywhere: https://docs.python.org/3/library/pathlib.html#pathlib.Path.open

os.path.join turns it back to a mere string.

writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
for row in query_results:
writer.writerow(row.asDict())
# Add the CSV file to the ZIP archive
self._add_to_zip(path, file_name)

def _add_to_zip(self, path: Path, file_name) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _add_to_zip(self, path: Path, file_name) -> None:
def _add_to_zip(self, path: Path, file_name: str) -> None:

use types everywhere

"""Create a ZIP file containing all the CSV files."""
zip_path = path / self._ZIP_FILE_NAME
file_path = path / file_name

try:
with ZipFile(zip_path, 'a') as zipf:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to create temporary CSV files if you can write them directly to the open zip?https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.open

with ZipFile(target_folder / 'ucx-export.zip', mode='w') as z:
    ...
    with z.open(f'{tile.id}.csv') as f:
        writer = csv.DictWriter(f, fieldnames=headers)
        writer.writeheader()
        for row in query_results:
             ...

this way you don't have to cleanup a file.

zipf.write(file_path, arcname=file_name)

except FileNotFoundError:
print(f"File {file_path} not found.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"File {file_path} not found.")
logger.warning(f"File {file_path} not found.")

don't use print, use a logger; i've realised you didn't run make fmt, as our static code analysis checker would catch it on your local machine instead.

except PermissionError:
print(f"Permission denied for {file_path} or {zip_path}.")

# Clean up the file if it was successfully added
if file_path.exists():
self._cleanup(path, file_name)

def export_results(self, prompts: Prompts, path: Path | None) -> None:
"""Main method to export results to CSV files inside a ZIP archive."""
results = self._get_ucx_main_queries()
if path is None:
response = prompts.question(
"Choose a path to save the UCX Assessment results",
default=Path.cwd().as_posix(),
validate=lambda p_: Path(p_).exists(),
)
path = Path(response)
else:
logger.info(f"Using the provided path: {path}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
logger.info(f"Using the provided path: {path}")

it's redundant, as you have logger.info(f"Exporting UCX Assessment (Main) results to {path}")

try:
logger.info(f"Exporting UCX Assessment (Main) results to {path}")
with ThreadPoolExecutor(max_workers=4) as executor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use ThreadPoolExecutor directly, as it "swallows" errors by default and needs more work for a robust error handling. We use Threads.strict(...) across the codebase. See docs at https://github.com/databrickslabs/blueprint?tab=readme-ov-file#parallel-task-execution

you can add threading.Lock() on this instance and something like:

with ZipFile(path / 'ucx-export.zip', mode='w') as zip:
    tasks = [partial(self._append_to_zip, zip, tile) for tile in dashboard.tiles]
    Threads.strict("exporting", tasks)

futures = [executor.submit(self._execute_query, path, result) for result in results]
for future in futures:
future.result()

except TimeoutError as e:
print("A thread execution timed out. Check the query execution logic.")
print(f"Error exporting results: {e}")
finally:
logger.info(f"UCX Assessment (Main) results exported to {path}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are they? what if the path you're trying to write is not writable, like /dev/null?

9 changes: 9 additions & 0 deletions src/databricks/labs/ucx/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from databricks.labs.ucx.hive_metastore.tables import What
from databricks.labs.ucx.install import AccountInstaller
from databricks.labs.ucx.source_code.linters.files import LocalCodeLinter
from databricks.labs.ucx.assessment.export import Exporter

ucx = App(__file__)
logger = get_logger(__file__)
Expand Down Expand Up @@ -564,6 +565,14 @@ def join_collection(a: AccountClient, workspace_ids: str):
account_installer.join_collection(w_ids)


@ucx.command()
def export(w: WorkspaceClient, prompts: Prompts, path: Path | None = None):
"""exports the assessment dashboard"""
ctx = WorkspaceContext(w)
exporter = Exporter(ctx)
rportilla-databricks marked this conversation as resolved.
Show resolved Hide resolved
exporter.export_results(prompts, path)


@ucx.command
def lint_local_code(
w: WorkspaceClient, prompts: Prompts, path: str | None = None, ctx: LocalCheckoutContext | None = None
Expand Down
132 changes: 132 additions & 0 deletions src/databricks/labs/ucx/installer/workflows.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
from databricks.labs.ucx.installer.logs import PartialLogRecord, parse_logs
from databricks.labs.ucx.installer.mixins import InstallationMixin


logger = logging.getLogger(__name__)

TEST_RESOURCE_PURGE_TIMEOUT = timedelta(hours=1)
Expand Down Expand Up @@ -112,6 +113,126 @@
f'--parent_run_id=' + dbutils.widgets.get('parent_run_id'))
"""

EXPORT_UCX_NOTEBOOK = """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPORT_UCX_NOTEBOOK = """
EXPORT_TO_EXCEL_NOTEBOOK = """

# Databricks notebook source
# MAGIC %md
# MAGIC ##### Exporter of UCX assessment results
# MAGIC ##### Instructions:
# MAGIC 1. Execute using an all-purpose cluster with Databricks Runtime 14 or higher.
# MAGIC 1. Hit **Run all** button and wait for completion.
# MAGIC 1. Go to the bottom of the notebook and click the Download UCX Results button.
# MAGIC
# MAGIC ##### Important:
# MAGIC Please note that this is only meant to serve as example code.
# MAGIC This is not official **Databricks** or **Databricks Labs UCX** code.
# MAGIC
# MAGIC Example code developed by **Databricks Shared Technical Services team**.
# COMMAND ----------
# DBTITLE 1,Installing Packages
# MAGIC %pip install {remote_wheel} -q -q -q
# MAGIC %pip install xlsxwriter -q -q -q
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we suppress the pip install?

Suggested change
# MAGIC %pip install {remote_wheel} -q -q -q
# MAGIC %pip install xlsxwriter -q -q -q
# MAGIC %pip install {remote_wheel} -qqq
# MAGIC %pip install xlsxwriter -qqq

Note that this will fail for workspaces that have restrictive internet access. To do this similar to how we install ucx, use the upload_dependencies, see the install.py

# MAGIC dbutils.library.restartPython()
# COMMAND ----------
# DBTITLE 1,Import Libraries
# Standard library imports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant comments, this is convention, comments do not help

import os
nfx marked this conversation as resolved.
Show resolved Hide resolved
import re
import shutil
import json
from typing import List, Dict
from ast import literal_eval
from concurrent.futures import ThreadPoolExecutor
# Third-party library imports
import pandas as pd
import xlsxwriter
# Databricks imports
from databricks.labs.ucx.contexts.workflow_task import RuntimeContext
import databricks.labs.ucx.queries.assessment.main as queries
# Resource management
import importlib.resources as resources
# COMMAND ----------
# DBTITLE 1,UCX Assessment Export
rportilla-databricks marked this conversation as resolved.
Show resolved Hide resolved
class Exporter:
# File and Path Constants
_FILE_NAME = "ucx_assessment_results.xlsx"
_TMP_PATH = "/Workspace/Applications/ucx/ucx_results/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has to be replaced with self._installation.install_folder() value, as there are modes when it's installed into /Users/foo@bar.com/.ucx. for the folder, let's name it as excel-export-results or something like it.

_DOWNLOAD_PATH = "/dbfs/FileStore/ucx_results"
# Named Parameters
_NAMED_PARAMS = dict("config": "/Workspace{config_file}")
def __init__(self) -> None:
self._ctx = RuntimeContext(self._NAMED_PARAMS)
def _get_ucx_main_queries(self) -> List[Dict[str, str]]:
rportilla-databricks marked this conversation as resolved.
Show resolved Hide resolved
'''Retrieve and construct the main UCX queries.'''
pattern = r"\\b.inventory\\b"
schema = self._ctx.inventory_database
sql_files = [
file.name
for file in resources.files(queries).iterdir()
if file.suffix == ".sql" and "count" not in file.name
]
ucx_main_queries = [
dict(name = "01_1_permissions","query": f"SELECT * FROM {schema}.permissions"),
dict(name = "02_2_ucx_grants", "query": f"SELECT * FROM {schema}.grants;"),
dict(name = "03_3_groups", "query": f"SELECT * FROM {schema}.groups;"),
]
for sql_file in sql_files:
with resources.as_file(resources.files(queries) / sql_file) as file_path:
content = file_path.read_text()
modified_content = re.sub(pattern, f" {schema}", content, flags=re.IGNORECASE)
query_name = sql_file[:-4]
ucx_main_queries.append(dict(name = query_name, "query": modified_content)
return ucx_main_queries
def _cleanup(self) -> None:
'''Move the temporary results file to the download path and clean up the temp directory.'''
shutil.move(
os.path.join(self._TMP_PATH, self._FILE_NAME),
os.path.join(self._DOWNLOAD_PATH, self._FILE_NAME),
)
shutil.rmtree(self._TMP_PATH)
def _prepare_directories(self) -> None:
'''Ensure that the necessary directories exist.'''
os.makedirs(self._TMP_PATH, exist_ok=True)
os.makedirs(self._DOWNLOAD_PATH, exist_ok=True)
def _execute_query(self, result: Dict[str, str], writer: pd.ExcelWriter) -> None:
'''Execute a SQL query and write the result to an Excel sheet.'''
pattern = r'^\\d+_\\d+_(.*)'
match = re.search(pattern, result["name"])
if match:
sheet_name = match.group(1)
sdf = spark.sql(result["query"])
if sdf.count() > 0:
df = sdf.toPandas()
df.to_excel(writer, sheet_name=sheet_name, index=False)
def _render_export(self) -> None:
'''Render an HTML link for downloading the results.'''
html_content = f'''
<style>@font-face{{font-family:'DM Sans';src:url(https://cdn.bfldr.com/9AYANS2F/at/p9qfs3vgsvnp5c7txz583vgs/dm-sans-regular.ttf?auto=webp&format=ttf) format('truetype');font-weight:400;font-style:normal}}body{{font-family:'DM Sans',Arial,sans-serif}}.export-container{{text-align:center;margin-top:20px}}.export-container h2{{color:#1B3139;font-size:24px;margin-bottom:20px}}.export-container a{{display:inline-block;padding:12px 25px;background-color:#1B3139;color:#fff;text-decoration:none;border-radius:4px;font-size:18px;font-weight:500;transition:background-color 0.3s ease,transform 0.3s ease}}.export-container a:hover{{background-color:#FF3621;transform:translateY(-2px)}}</style><div class="export-container"><h2>Export Results</h2><a href='{workspace_host}files/ucx_results/ucx_assessment_results.xlsx?o={workspace_id}' target='_blank' download>Download UCX Results </a></div>
'''
displayHTML(html_content)
def export_results(self) -> None:
'''Main method to export results to an Excel file.'''
self._prepare_directories()
results = self._get_ucx_main_queries()
try:
with pd.ExcelWriter(
os.path.join(self._TMP_PATH, self._FILE_NAME), engine="xlsxwriter"
) as writer:
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [
executor.submit(self._execute_query, result, writer)
for result in results
]
for future in futures:
future.result()
self._cleanup()
self._render_export()
except Exception as e:
print(f"Error exporting results ", e)
# COMMAND ----------
# DBTITLE 1,Automate UCX Data Export
Exporter().export_results()
"""


class DeployedWorkflows:
def __init__(self, ws: WorkspaceClient, install_state: InstallState, verify_timeout: timedelta):
Expand Down Expand Up @@ -486,6 +607,7 @@ def create_jobs(self) -> None:

self._install_state.save()
self._create_debug(remote_wheels)
self._create_export(remote_wheels)
self._create_readme()

@property
Expand Down Expand Up @@ -788,6 +910,16 @@ def _create_debug(self, remote_wheels: list[str]):
).encode("utf8")
self._installation.upload('DEBUG.py', content)

def _create_export(self, remote_wheels: list[str]):
content = EXPORT_UCX_NOTEBOOK.format(
remote_wheel=remote_wheels,
config_file=self._config_file,
workspace_host=self._ws.config.host,
workspace_id=self._ws.get_workspace_id(),
schema=self._config.inventory_database,
).encode("utf8")
self._installation.upload('EXPORT_UCX_RESULTS.py', content)


class MaxedStreamHandler(logging.StreamHandler):

Expand Down
Loading