-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added export CLI functionality for assessment results #2553
Conversation
- Created the function to upload the notebook to the workspaces.
Added UCX Export Result notebook as Utility Hack
Feat/add exporter
Update workflows.py
Adding unit tests coverage
# File and Path Constants | ||
_ZIP_FILE_NAME = "ucx_assessment_results.zip" | ||
|
||
def __init__(self, ctx: WorkspaceContext): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self, ctx: WorkspaceContext): | |
def __init__(self, sql_backend: SqlBackend, config: WorkspaceConfig): |
it's an anti-pattern to depend on the entire WorkspaceContext
. depend only on what is used.
|
||
for sql_file in sql_files: | ||
content = sql_file.read_text() | ||
modified_content = re.sub(pattern, f" {schema}", content, flags=re.IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified_content = re.sub(pattern, f" {schema}", content, flags=re.IGNORECASE) | |
modified_content = self._config.replace_inventory_variable(content) |
use databricks.labs.ucx.config.WorkspaceConfig.replace_inventory_variable that is already for this purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comment, this method may be written in more maintainable way
tests/unit/assessment/test_export.py
Outdated
from databricks.labs.ucx.contexts.workspace_cli import WorkspaceContext | ||
|
||
|
||
class TestExporter(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be consistent with other tests in the project - use pytest
with top-level-function per test.
tests/unit/assessment/test_export.py
Outdated
@patch("databricks.labs.ucx.assessment.export.Path.exists") | ||
@patch("databricks.labs.ucx.assessment.export.Exporter._execute_query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obscure implicit test dependency with mock.patch(XXX). Rewrite to inject dependencies through constructor.. Using patch to mock dependencies in unit tests can introduce implicit dependencies within a class, making it unclear to other developers. Constructor arguments, on the other hand, explicitly declare dependencies, enhancing code readability and maintainability. However, reliance on patch for testing may lead to issues during refactoring, as updates to underlying implementations would necessitate changes across multiple unrelated unit tests. Moreover, the use of hard-coded strings in patch can obscure which unit tests require modification, as they lack strongly typed references. This coupling of the class under test to concrete classes signifies a code smell, and such code is not easily portable to statically typed languages where monkey patching isn't feasible without significant effort. In essence, extensive patching of external clients suggests a need for refactoring, with experienced engineers recognizing the potential for dependency inversion in such scenarios.
To address this issue, refactor the code to inject dependencies through the constructor. This approach explicitly declares dependencies, enhancing code readability and maintainability. Moreover, it allows for dependency inversion, enabling the use of interfaces to decouple the class under test from concrete classes. This decoupling facilitates unit testing, as it allows for the substitution of mock objects for concrete implementations, ensuring that the class under test behaves as expected. By following this approach, you can create more robust and maintainable unit tests, improving the overall quality of your codebase.
Obscure implicit test dependency with MagicMock(). Rewrite with create_autospec(ConcreteType).. Using MagicMock to mock dependencies in unit tests can introduce implicit dependencies within a class, making it unclear to other developers. create_autospec(ConcreteType) is a better alternative, as it automatically creates a mock object with the same attributes and methods as the concrete class. This approach ensures that the mock object behaves like the concrete class, allowing for more robust and maintainable unit tests. Moreover, reliance on MagicMock for testing leads to issues during refactoring, as updates to underlying implementations would necessitate changes across multiple unrelated unit tests.
Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to submit my review, might be outdated given nfx comments
@@ -10,8 +10,10 @@ so that you'll be able to [scope the migration](docs/assessment.md) and execute | |||
The [README notebook](#readme-notebook), which can be found in the installation folder contains further instructions and explanations of the different ucx workflows & dashboards. | |||
Once the migration is scoped, you can start with the [table migration process](#Table-Migration). | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove redundant newlines
labs.yml
Outdated
@@ -275,3 +275,5 @@ commands: | |||
- name: target-workspace-id | |||
description: (Optional) id of a workspace in the target collection. If not specified, ucx will prompt to select from a list | |||
|
|||
- name: export | |||
description: export widget data from the assessment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the command to accept a second argument what
to indicate what is exported:
databricks labs ucx export --what assessment
description: export widget data from the assessment | |
description: export ucx data |
@nfx: please provide your input on this API
import os | ||
import re | ||
import csv | ||
import logging | ||
from pathlib import Path | ||
from zipfile import ZipFile | ||
from concurrent.futures import ThreadPoolExecutor | ||
from databricks.labs.blueprint.tui import Prompts | ||
from databricks.labs.ucx.contexts.workspace_cli import WorkspaceContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isort
|
||
class Exporter: | ||
# File and Path Constants | ||
_ZIP_FILE_NAME = "ucx_assessment_results.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ZIP_FILE_NAME = "ucx_assessment_results.zip" | |
_EXPORT_FILE_NAME = "ucx_assessment_results.zip" |
…databrickslabs#2746) ## Changes Our code works around a limitation of astroid < 3.3 where f-strings are not inferred This PR: - updates pylint and astroid - drops workarounds - fixes corresponding tests ### Linked issues None ### Functionality None ### Tests - [x] updated unit tests --------- Co-authored-by: Eric Vergnaud <eric.vergnaud@databricks.com>
…,<0.10 (databrickslabs#2747) Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's changelog</a>.</em></p> <blockquote> <h2>0.9.0</h2> <ul> <li>Added Databricks CLI version as part of routed command telemetry (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/147">#147</a>). A new environment variable, "DATABRICKS_CLI_VERSION", has been introduced in the Databricks CLI version for routed command telemetry. This variable is incorporated into the <code>with_user_agent_extra</code> method, which adds it to the user agent for outgoing requests, thereby enhancing detailed tracking and version identification in telemetry data. The <code>with_user_agent_extra</code> method is invoked twice, with the <code>blueprint</code> prefix and the <strong>version</strong> variable, followed by the <code>cli</code> prefix and the DATABRICKS_CLI_VERSION environment variable, ensuring that both the blueprint and CLI versions are transmitted in the user agent for all requests.</li> </ul> <h2>0.8.3</h2> <ul> <li>add missing stat() methods to DBFSPath and WorkspacePath (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/144">#144</a>). The <code>stat()</code> method has been added to both <code>DBFSPath</code> and <code>WorkspacePath</code> classes, addressing issues <a href="https://redirect.github.com/databrickslabs/blueprint/issues/142">#142</a> and <a href="https://redirect.github.com/databrickslabs/blueprint/issues/143">#143</a>. This method, which adheres to the Posix standard, returns file status in the <code>os.stat_result</code> format, providing access to various metadata attributes such as file size, last modification time, and creation time. By incorporating this method, developers can now obtain essential file information for Databricks File System (DBFS) and Databricks Workspace paths when working with these classes. The change includes a new test case for <code>stat()</code> in the <code>test_paths.py</code> file to ensure the correctness of the method for both classes.</li> </ul> <h2>0.8.2</h2> <ul> <li>Make hatch a prerequisite (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/137">#137</a>). In version 1.9.4, hatch has become a prerequisite for installation in the GitHub workflow for the project's main branch, due to occasional failures in <code>pip install hatch</code> that depend on the local environment. This change, which includes defining the hatch version as an environment variable and adding a new step for installing hatch with a specific version, aims to enhance the reliability of the build and testing process by eliminating potential installation issues with hatch. Users should install hatch manually before executing the Makefile, as the line <code>pip install hatch</code> has been removed from the Makefile. This change aligns with the approach taken for ucx, and users are expected to understand the requirement to install prerequisites before executing the Makefile. To contribute to this project, please install hatch using <code>pip install hatch</code>, clone the GitHub repository, and run <code>make dev</code> to start the development environment and install necessary dependencies.</li> <li>support files with unicode BOM (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/138">#138</a>). The recent change to the open-source library introduces support for handling files with a Unicode Byte Order Mark (BOM) during file upload and download operations in Databricks Workspace. This new functionality, added to the <code>WorkspacePath</code> class, allows for easier reading of text from files with the addition of a <code>read_text</code> method. When downloading a file, if it starts with a BOM, it will be detected and used for decoding, regardless of the preferred encoding based on the system's locale. The change includes a new test function that verifies the accurate encoding and decoding of files with different types of BOM using the appropriate encoding. Despite the inability to test Databrick notebooks with a BOM due to the Databricks platform modifying the uploaded data, this change enhances support for handling files with various encodings and BOM, improving compatibility with a broader range of file formats, and ensuring more accurate handling of files with BOM.</li> </ul> <h2>0.8.1</h2> <ul> <li>Fixed py3.10 compatibility for <code>_parts</code> in pathlike (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/135">#135</a>). The recent update to our open-source library addresses the compatibility issue with Python 3.10 in the <code>_parts</code> property of a certain type. Prior to this change, there was also a <code>_cparts</code> property that returned the same value as <code>_parts</code>, which has been removed and replaced with a direct reference to <code>_parts</code>. The <code>_parts</code> property can now be accessed via reverse equality comparison, and this change has been implemented in the <code>joinpath</code> and <code>__truediv__</code> methods as well. This enhancement improves the library's compatibility with Python 3.10 and beyond, ensuring continued functionality and stability for software engineers working with the latest Python versions.</li> </ul> <h2>0.8.0</h2> <ul> <li>Added <code>DBFSPath</code> as <code>os.PathLike</code> implementation (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/131">#131</a>). The open-source library has been updated with a new class <code>DBFSPath</code>, an implementation of <code>os.PathLike</code> for Databricks File System (DBFS) paths. This new class extends the existing <code>WorkspacePath</code> support and provides pathlib-like functionality for DBFS paths, including methods for creating directories, renaming and deleting files and directories, and reading and writing files. The addition of <code>DBFSPath</code> includes type-hinting for improved code linting and is integrated in the test suite with new and updated tests for path-like objects. The behavior of the <code>exists</code> and <code>unlink</code> methods have been updated for <code>WorkspacePath</code> to improve performance and raise appropriate errors.</li> <li>Fixed <code>.as_uri()</code> and <code>.absolute()</code> implementations for <code>WorkspacePath</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/127">#127</a>). In this release, the <code>WorkspacePath</code> class in the <code>paths.py</code> module has been updated with several improvements to the <code>.as_uri()</code> and <code>.absolute()</code> methods. These methods now utilize PathLib internals, providing better cross-version compatibility. The <code>.as_uri()</code> method now uses an f-string for concatenation and returns the UTF-8 encoded string representation of the <code>WorkspacePath</code> object via a new <code>__bytes__()</code> dunder method. Additionally, the <code>.absolute()</code> method has been implemented for the trivial (no-op) case and now supports returning the absolute path of files or directories in Databricks Workspace. Furthermore, the <code>glob()</code> and <code>rglob()</code> methods have been enhanced to support case-sensitive pattern matching based on a new <code>case_sensitive</code> parameter. To ensure the integrity of these changes, two new test cases, <code>test_as_uri()</code> and <code>test_absolute()</code>, have been added, thoroughly testing the functionality of these methods.</li> <li>Fixed <code>WorkspacePath</code> support for python 3.11 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/121">#121</a>). The <code>WorkspacePath</code> class in our open-source library has been updated to improve compatibility with Python 3.11. The <code>.expanduser()</code> and <code>.glob()</code> methods have been modified to address internal changes in Python 3.11. The <code>is_dir()</code> and <code>is_file()</code> methods now include a <code>follow_symlinks</code> parameter, although it is not currently used. A new method, <code>_scandir()</code>, has been added for compatibility with Python 3.11. The <code>expanduser()</code> method has also been updated to expand <code>~</code> (but not <code>~user</code>) constructs. Additionally, a new method <code>is_notebook()</code> has been introduced to check if the path points to a notebook in Databricks Workspace. These changes aim to ensure that the library functions smoothly with the latest version of Python and provides additional functionality for users working with Databricks Workspace.</li> <li>Properly verify versions of python (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/118">#118</a>). In this release, we have made significant updates to the pyproject.toml file to enhance project dependency and development environment management. We have added several new packages to the <code>dependencies</code> section to expand the library's functionality and compatibility. Additionally, we have removed the <code>python</code> field, as it is no longer necessary. We have also updated the <code>path</code> field to specify the location of the virtual environment, which can improve integration with popular development tools such as Visual Studio Code and PyCharm. These changes are intended to streamline the development process and make it easier to manage dependencies and set up the development environment.</li> <li>Type annotations on path-related unit tests (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/128">#128</a>). In this open-source library update, type annotations have been added to path-related unit tests to enhance code clarity and maintainability. The tests encompass various scenarios, including verifying if a path exists, creating, removing, and checking directories, and testing file attributes such as distinguishing directories, notebooks, and regular files. The additions also cover functionality for opening and manipulating files in different modes like read binary, write binary, read text, and write text. Furthermore, tests for checking file permissions, handling errors, and globbing (pattern-based file path matching) have been incorporated. The tests interact with a WorkspaceClient mock object, simulating file system interactions. This enhancement bolsters the library's reliability and assists developers in creating robust, well-documented code when working with file system paths.</li> <li>Updated <code>WorkspacePath</code> to support Python 3.12 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/122">#122</a>). In this release, the <code>WorkspacePath</code> implementation has been updated to ensure compatibility with Python 3.12, in addition to Python 3.10 and 3.11. The class was modified to replace most of the internal implementation and add extensive tests for public interfaces, ensuring that the superclass implementations are not used unless they are known to be safe. This change is in response to the significant changes in the superclass implementations between Python 3.11 and 3.12, which were found to be incompatible with each other. The <code>WorkspacePath</code> class now includes several new methods and tests to ensure that it functions seamlessly with different versions of Python. These changes include testing for initialization, equality, hash, comparison, path components, and various path manipulations. This update enhances the library's adaptability and ensures it functions correctly with different versions of Python. Classifiers have also been updated to include support for Python 3.12.</li> <li><code>WorkspacePath</code> fixes for the <code>.resolve()</code> implementation (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/129">#129</a>). The <code>.resolve()</code> method for <code>WorkspacePath</code> has been updated to improve its handling of relative paths and the <code>strict</code> argument. Previously, relative paths were not properly validated and would be returned as-is. Now, relative paths will cause the method to fail. The <code>strict</code> argument is now checked, and if set to <code>True</code> and the path does not exist, a <code>FileNotFoundError</code> will be raised. The method <code>.absolute()</code> is used to obtain the absolute path of the file or directory in Databricks Workspace and is used in the implementation of <code>.resolve()</code>. A new test, <code>test_resolve()</code>, has been added to verify these changes, covering scenarios where the path is absolute, the path exists, the path does not exist, and the path is relative. In the case of relative paths, a <code>NotImplementedError</code> is raised, as <code>.resolve()</code> is not supported for them.</li> <li><code>WorkspacePath</code>: Fix the .rename() and .replace() implementations to return the target path (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/130">#130</a>). The <code>.rename()</code> and <code>.replace()</code> methods of the <code>WorkspacePath</code> class have been updated to return the target path as part of the public API, with <code>.rename()</code> no longer accepting the <code>overwrite</code> keyword argument and always failing if the target path already exists. A new private method, <code>._rename()</code>, has been added to include the <code>overwrite</code> argument and is used by both <code>.rename()</code> and <code>.replace()</code>. This update is a preparatory step for factoring out common code to support DBFS paths. The tests have been updated accordingly, combining and adding functions to test the new and updated methods. The <code>.unlink()</code> method's behavior remains unchanged. Please note that the exact error raised when <code>.rename()</code> fails due to an existing target path is yet to be defined.</li> </ul> <p>Dependency updates:</p> <ul> <li>Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/133">#133</a>).</li> </ul> <h2>0.7.0</h2> <ul> <li>Added <code>databricks.labs.blueprint.paths.WorkspacePath</code> as <code>pathlib.Path</code> equivalent (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/115">#115</a>). This commit introduces the <code>databricks.labs.blueprint.paths.WorkspacePath</code> library, providing Python-native <code>pathlib.Path</code>-like interfaces to simplify working with Databricks Workspace paths. The library includes <code>WorkspacePath</code> and <code>WorkspacePathDuringTest</code> classes offering advanced functionality for handling user home folders, relative file paths, browser URLs, and file manipulation methods such as <code>read/write_text()</code>, <code>read/write_bytes()</code>, and <code>glob()</code>. This addition brings enhanced, Pythonic ways to interact with Databricks Workspace paths, including creating and moving files, managing directories, and generating browser-accessible URIs. Additionally, the commit includes updates to existing methods and introduces new fixtures for creating notebooks, accompanied by extensive unit tests to ensure reliability and functionality.</li> <li>Added propagation of <code>blueprint</code> version into <code>User-Agent</code> header when it is used as library (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/114">#114</a>). A new feature has been introduced in the library that allows for the propagation of the <code>blueprint</code> version and the name of the command line interface (CLI) command used in the <code>User-Agent</code> header when the library is utilized as a library. This feature includes the addition of two new pairs of <code>OtherInfo</code>: <code>blueprint/X.Y.Z</code> to indicate that the request is made using the <code>blueprint</code> library and <code>cmd/<name></code> to store the name of the CLI command used for making the request. The implementation involves using the <code>with_user_agent_extra</code> function from <code>databricks.sdk.config</code> to set the user agent consistently with the Databricks CLI. Several changes have been made to the test file for <code>test_useragent.py</code> to include a new test case, <code>test_user_agent_is_propagated</code>, which checks if the <code>blueprint</code> version and the name of the command are correctly propagated to the <code>User-Agent</code> header. A context manager <code>http_fixture_server</code> has been added that creates an HTTP server with a custom handler, which extracts the <code>blueprint</code> version and the command name from the <code>User-Agent</code> header and stores them in the <code>user_agent</code> dictionary. The test case calls the <code>foo</code> command with a mocked <code>WorkspaceClient</code> instance and sets the <code>DATABRICKS_HOST</code> and <code>DATABRICKS_TOKEN</code> environment variables to test the propagation of the <code>blueprint</code> version and the command name in the <code>User-Agent</code> header. The test case then asserts that the <code>blueprint</code> version and the name of the command are present and correctly set in the <code>user_agent</code> dictionary.</li> <li>Bump actions/checkout from 4.1.6 to 4.1.7 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/112">#112</a>). In this release, the version of the "actions/checkout" action used in the <code>Checkout Code</code> step of the acceptance workflow has been updated from 4.1.6 to 4.1.7. This update may include bug fixes, performance improvements, and new features, although specific changes are not mentioned in the commit message. The <code>Unshallow</code> step remains unchanged, continuing to fetch and clean up the repository's history. This update ensures that the latest enhancements from the "actions/checkout" action are utilized, aiming to improve the reliability and performance of the code checkout process in the GitHub Actions workflow. Software engineers should be aware of this update and its potential impact on their workflows.</li> </ul> <p>Dependency updates:</p> <ul> <li>Bump actions/checkout from 4.1.6 to 4.1.7 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/112">#112</a>).</li> </ul> <h2>0.6.3</h2> <ul> <li>fixed <code>Command.get_argument_type</code> bug with <code>UnionType</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/110">#110</a>). In this release, the <code>Command.get_argument_type</code> method has been updated to include special handling for <code>UnionType</code>, resolving a bug that caused the function to crash when encountering this type. The method now returns the string representation of the annotation if the argument is a <code>UnionType</code>, providing more accurate and reliable results. To facilitate this, modifications were made using the <code>types</code> module. Additionally, the <code>foo</code> function has a new optional argument <code>optional_arg</code> of type <code>str</code>, with a default value of <code>None</code>. This argument is passed to the <code>some</code> function in the assertion. The <code>Prompts</code> type has been added to the <code>foo</code> function signature, and an assertion has been added to verify if <code>prompts</code> is an instance of <code>Prompts</code>. Lastly, the default value of the <code>address</code> argument has been changed from an empty string to "default", and the same changes have been applied to the <code>test_injects_prompts</code> test function.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/databrickslabs/blueprint/commit/c3b53a48471b7c3d9ff911d7c6cc3921d6dd9846"><code>c3b53a4</code></a> Release v0.9.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/148">#148</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/98c5f305e721b7f9b2db88db7ad062481e4191dd"><code>98c5f30</code></a> Added Databricks CLI version as part of routed command telemetry (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/147">#147</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/2bfbf1801c1f8638dfadd5c072daeb4cbb9fa372"><code>2bfbf18</code></a> Release v0.8.3 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/145">#145</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/36fc873c0f9795e293d4716916fb96ed31680240"><code>36fc873</code></a> add missing stat() methods to DBFSPath and WorkspacePath (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/144">#144</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c531c3f5627d15057d0ae6e570140dedbed968ef"><code>c531c3f</code></a> Release v0.8.2 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/139">#139</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/53b94634639e673eeac880b005e9e64981259035"><code>53b9463</code></a> support files with unicode BOM (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/138">#138</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/ec8232664d4f45e2233404c7ab414d7c3393db1e"><code>ec82326</code></a> Make hatch a prerequisite (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/137">#137</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/98e75bcffd72dd3075a772947e0d06042ba81f6a"><code>98e75bc</code></a> Release v0.8.1 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/136">#136</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/821bc0adb4438e211586af91d3ea011bf97115cf"><code>821bc0a</code></a> Fixed py3.10 compatibility for <code>_parts</code> in pathlike (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/135">#135</a>)</li> <li>See full diff in <a href="https://github.com/databrickslabs/blueprint/compare/v0.8.0...v0.9.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Changes solacc.py currently lints the entire solacc repo, thus accumulating temporary files to a point that exceeds CI storage capacity This PR fixes the issue by: - lint the repo on a per top-level solacc 'solution' (within the repo, top folders are independent of each other) - delete temp files and dirs registered in PathLookup after linting a solution This PR also prepares for improving false positive detection ### Linked issues None ### Functionality None ### Tests - [x] manually tested --------- Co-authored-by: Eric Vergnaud <eric.vergnaud@databricks.com>
## Changes This PR just includes changes from `make fmt`, which doesn't currently pass on `main`. ### Linked issues Updates databrickslabs#2746.
…n parallel (databrickslabs#2745) We were not doing that before and now we do.
## Changes Harden configuration reading by verifying the type before reading the "value" using `.get` ### Linked issues Resolves databrickslabs#2581 (hopefully the second get is the issue, type hinting should cover that, but who knows) ### Functionality - [x] modified existing workflow: `assessment` ### Tests - [x] added unit tests
…slabs#2734) ## Changes Add unskip CLI command to undo a skip on schema or a table ### Linked issues Resolves databrickslabs#1938 ### Functionality - [x] added relevant user documentation - [x] added new CLI command --> unskip ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] Unit test added
## Changes `solacc` currently lints on a per-file basis, which is incorrect this PR implements linting a per solution basis, thus improving dependency resolution ### Linked issues None ### Functionality None ### Tests - [x] manually tested --------- Co-authored-by: Eric Vergnaud <eric.vergnaud@databricks.com>
Patch export v2
…t-reviewed Feat/add cli export assessment reviewed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Added `google-cloud-core` to known list ([#2826](#2826)). In this release, we have incorporated the `google-cloud-core` library into our project's configuration file, specifying several modules from this library. This change is part of the resolution of issue [#1931](#1931), which pertains to working with Google Cloud services. The `google-cloud-core` library offers core functionalities for Google Cloud client libraries, including helper functions, HTTP-related functionalities, testing utilities, client classes, environment variable handling, exceptions, obsolete features, operation tracking, and version management. By adding these new modules to the known list in the configuration file, we can now utilize them in our project as needed, thereby enhancing our ability to work with Google Cloud services. * Added `gviz-api` to known list ([#2831](#2831)). In this release, we have added the `gviz-api` library to our known library list, specifically specifying the `gviz_api` package within it. This addition enables the proper handling and recognition of components from the `gviz-api` library in the system, thereby addressing a portion of issue [#1931](#1931). While the specifics of the `gviz-api` library's implementation and usage are not described in the commit message, it is expected to provide functionality related to data visualization. This enhancement will enable us to expand our system's capabilities and provide more comprehensive solutions for our users. * Added export CLI functionality for assessment results ([#2553](#2553)). A new `export` command-line interface (CLI) function has been added to the open-source library to export assessment results. This feature includes the addition of a new `AssessmentExporter` class in the `export.py` module, which is responsible for exporting assessment results to CSV files inside a ZIP archive. Users can specify the destination path and type of report for the exported results. A notebook utility is also included to run the export from the workspace environment, with default location, unit tests, and integration tests for the notebook utility. The `acl_migrator` method has been optimized for better performance. This new functionality provides more flexibility in exporting assessment results and improves the overall assessment functionality of the library. * Added functional test related to bug [#2850](#2850) ([#2880](#2880)). A new functional test has been added to address a bug fix related to issue [#2850](#2850), which involves reading data from a CSV file located in a volume using Spark's readStream function. The test specifies various options including file format, schema location, header, and compression. The CSV file is loaded from '/Volumes/playground/test/demo_data/' and the schema location is set to '/Volumes/playground/test/schemas/'. Additionally, a unit test has been added and is referenced in the commit. This functional test will help ensure that the bug fix for issue [#2850](#2850) is working as expected. * Added handling for `PermissionDenied` when retrieving `WorkspaceClient`s from account ([#2877](#2877)). In this release, the `workspace_clients` method of the `Account` class in `workspaces.py` has been updated to handle `PermissionDenied` exceptions when retrieving `WorkspaceClient`s. This change introduces a try-except block around the command retrieving the workspace client, which catches the `PermissionDenied` exception and logs a warning message if access to a workspace is denied. If no exception is raised, the workspace client is added to the list of clients as before. The commit also includes a new unit test to verify this functionality. This update addresses issue [#2874](#2874) and enhances the robustness of the `databricks labs ucx sync-workspace-info` command by ensuring it gracefully handles permission errors during workspace retrieval. * Added testing with Python 3.13 ([#2878](#2878)). The project has been updated to include testing with Python 3.13, in addition to the previously supported versions of Python 3.10, 3.11, and 3.12. This update is reflected in the `.github/workflows/push.yml` file, which now includes '3.13' in the `pyVersion` matrix for the jobs. This addition expands the range of Python versions that the project can be tested and run on, providing increased flexibility and compatibility for users, as well as ensuring continued support for the latest versions of the Python programming language. * Added used tables in assessment dashboard ([#2836](#2836)). In this update, we introduce a new widget to the assessment dashboard for displaying used tables, enhancing visibility into how tables are utilized within the Databricks environment. This change includes the addition of the `UsedTable` class in the `databricks.labs.ucx.source_code.base` module, which tracks table usage details in the inventory database. Two new methods, `collect_dfsas_from_query` and `collect_used_tables_from_query`, have been implemented to collect data source access and used tables information from a query, with lineage information added to the table details. Additionally, a test function, `test_dashboard_with_prepopulated_data`, has been introduced to prepopulate data for use in the dashboard, ensuring proper functionality of the new feature. * Avoid resource conflicts in integration tests by using a random dir name ([#2865](#2865)). In this release, we have implemented changes to address resource conflicts in integration tests by introducing random directory names. The `save_locations` method in `conftest.py` has been updated to generate random directory names using the `tempfile.mkdtemp` function, based on the value of the new `make_random` parameter. Additionally, in the `test_migrate.py` file located in the `tests/integration/hive_metastore` directory, the hard-coded directory name has been replaced with a random one generated by the `make_random` function, which is used when creating external tables and specifying the external delta location. Lastly, the `test_move_tables_table_properties_mismatch_preserves_original` function in `test_table_move.py` has been updated to include a randomly generated directory name in the table's external delta and storage location, ensuring that tests can run concurrently without conflicting with each other. These changes resolve the issue described in [#2797](#2797) and improve the reliability of integration tests. * Exclude dfsas from used tables ([#2841](#2841)). In this release, we've made significant improvements to the accuracy of table identification and handling in our system. We've excluded certain direct filesystem access patterns from being treated as tables in the current implementation, correcting a previous error. The `collect_tables` method has been updated to exclude table names matching defined direct filesystem access patterns. Additionally, we've added a new method `TableInfoNode` to wrap used tables and the nodes that use them. We've also introduced changes to handle direct filesystem access patterns more accurately, ensuring that the DataFrame API's `spark.table()` function is identified correctly, while the `spark.read.parquet()` function, representing direct filesystem access, is now ignored. These changes are supported by new unit tests to ensure correctness and reliability, enhancing the overall functionality and behavior of the system. * Fixed known matches false postives for libraries starting with the same name as a library in the known.json ([#2860](#2860)). This commit addresses an issue of false positives in known matches for libraries that have the same name as a library in the known.json file. The `module_compatibility` function in the `known.py` file was updated to look for exact matches or parent module matches, rather than just matches at the beginning of the name. This more nuanced approach ensures that libraries with similar names are not incorrectly flagged as having compatibility issues. Additionally, the `known.json` file is now sorted when constructing module problems, indicating that the order of the entries in this file may have been relevant to the issue being resolved. To ensure the accuracy of the changes, new unit tests were added. The test suite was expanded to include tests for known and unknown compatibility, and a new load test was added for the known.json file. These changes improve the reliability of the known matches feature, which is critical for ensuring the correct identification of compatibility issues. * Make delta format case sensitive ([#2861](#2861)). In this commit, the delta format is made case sensitive to enhance the robustness and reliability of the code. The `TableInMount` class has been updated with a `__post_init__` method to convert the `format` attribute to uppercase, ensuring case sensitivity. Additionally, the `Table` class in the `tables.py` file has been modified to include a `__post_init__` method that converts the `table_format` attribute to uppercase during object creation, making format comparisons case insensitive. New properties, `is_delta` and `is_hive`, have been added to the `Table` class to check if the table format is delta or hive, respectively. These changes affect the `what` method of the `AclMigrationWhat` enum class, which now checks for `is_delta` and `is_hive` instead of comparing `table_format` with `DELTA` and "HIVE". Relevant issues [#2858](#2858) and [#2840](#2840) have been addressed, and unit tests have been included to verify the behavior. However, the changes have not been verified on the staging environment yet. * Make delta format case sensitive ([#2862](#2862)). The recent update, derived from the resolution of issue [#2861](#2861), introduces a case-sensitive delta format to our open-source library, enhancing the precision of delta table tracking. This change impacts all table format-related code and is accompanied by additional tests for robustness. A new `location` column has been incorporated into the `table_estimates` view, facilitating the determination of delta table location. Furthermore, a new method has been implemented to extract the `location` column from the `table_estimates` view, further refining the project's functionality and accuracy in managing delta tables. * Verify UCX catalog is accessible at start of `migration-progress-experimental` workflow ([#2851](#2851)). In this release, we have introduced a new `verify_has_ucx_catalog` method in the `Application` class of the `databricks.labs.ucx.contexts` module, which checks for the presence of a UCX catalog in the workspace and returns an instance of the `VerifyHasCatalog` class. This method is used in the `migration-progress-experimental` workflow to verify UCX catalog accessibility, addressing issues [#2577](#2577) and [#2848](#2848) and progressing work on [#2816](#2816). The `verify_has_ucx_catalog` method is decorated with `@cached_property` and takes `workspace_client` and `ucx_catalog` as arguments. Additionally, we have added a new `VerifyHasCatalog` class that checks if a specified Unity Catalog (UC) catalog exists in the workspace and updated the import statement to include a `NotFound` exception. We have also added a timeout parameter to the `validate_step` function in the `workflows.py` file, modified the `migration-progress-experimental` workflow to include a new step `verify_prerequisites` in the `table_migration` job cluster, and added unit tests to ensure the proper functioning of these changes. These updates improve the application's ability to interact with UCX catalogs and ensure their presence and accessibility during workflow execution, while also enhancing the robustness and reliability of the `migration-progress-experimental` workflow.
* Added `google-cloud-core` to known list ([#2826](#2826)). In this release, we have incorporated the `google-cloud-core` library into our project's configuration file, specifying several modules from this library. This change is part of the resolution of issue [#1931](#1931), which pertains to working with Google Cloud services. The `google-cloud-core` library offers core functionalities for Google Cloud client libraries, including helper functions, HTTP-related functionalities, testing utilities, client classes, environment variable handling, exceptions, obsolete features, operation tracking, and version management. By adding these new modules to the known list in the configuration file, we can now utilize them in our project as needed, thereby enhancing our ability to work with Google Cloud services. * Added `gviz-api` to known list ([#2831](#2831)). In this release, we have added the `gviz-api` library to our known library list, specifically specifying the `gviz_api` package within it. This addition enables the proper handling and recognition of components from the `gviz-api` library in the system, thereby addressing a portion of issue [#1931](#1931). While the specifics of the `gviz-api` library's implementation and usage are not described in the commit message, it is expected to provide functionality related to data visualization. This enhancement will enable us to expand our system's capabilities and provide more comprehensive solutions for our users. * Added export CLI functionality for assessment results ([#2553](#2553)). A new `export` command-line interface (CLI) function has been added to the open-source library to export assessment results. This feature includes the addition of a new `AssessmentExporter` class in the `export.py` module, which is responsible for exporting assessment results to CSV files inside a ZIP archive. Users can specify the destination path and type of report for the exported results. A notebook utility is also included to run the export from the workspace environment, with default location, unit tests, and integration tests for the notebook utility. The `acl_migrator` method has been optimized for better performance. This new functionality provides more flexibility in exporting assessment results and improves the overall assessment functionality of the library. * Added functional test related to bug [#2850](#2850) ([#2880](#2880)). A new functional test has been added to address a bug fix related to issue [#2850](#2850), which involves reading data from a CSV file located in a volume using Spark's readStream function. The test specifies various options including file format, schema location, header, and compression. The CSV file is loaded from '/Volumes/playground/test/demo_data/' and the schema location is set to '/Volumes/playground/test/schemas/'. Additionally, a unit test has been added and is referenced in the commit. This functional test will help ensure that the bug fix for issue [#2850](#2850) is working as expected. * Added handling for `PermissionDenied` when retrieving `WorkspaceClient`s from account ([#2877](#2877)). In this release, the `workspace_clients` method of the `Account` class in `workspaces.py` has been updated to handle `PermissionDenied` exceptions when retrieving `WorkspaceClient`s. This change introduces a try-except block around the command retrieving the workspace client, which catches the `PermissionDenied` exception and logs a warning message if access to a workspace is denied. If no exception is raised, the workspace client is added to the list of clients as before. The commit also includes a new unit test to verify this functionality. This update addresses issue [#2874](#2874) and enhances the robustness of the `databricks labs ucx sync-workspace-info` command by ensuring it gracefully handles permission errors during workspace retrieval. * Added testing with Python 3.13 ([#2878](#2878)). The project has been updated to include testing with Python 3.13, in addition to the previously supported versions of Python 3.10, 3.11, and 3.12. This update is reflected in the `.github/workflows/push.yml` file, which now includes '3.13' in the `pyVersion` matrix for the jobs. This addition expands the range of Python versions that the project can be tested and run on, providing increased flexibility and compatibility for users, as well as ensuring continued support for the latest versions of the Python programming language. * Added used tables in assessment dashboard ([#2836](#2836)). In this update, we introduce a new widget to the assessment dashboard for displaying used tables, enhancing visibility into how tables are utilized within the Databricks environment. This change includes the addition of the `UsedTable` class in the `databricks.labs.ucx.source_code.base` module, which tracks table usage details in the inventory database. Two new methods, `collect_dfsas_from_query` and `collect_used_tables_from_query`, have been implemented to collect data source access and used tables information from a query, with lineage information added to the table details. Additionally, a test function, `test_dashboard_with_prepopulated_data`, has been introduced to prepopulate data for use in the dashboard, ensuring proper functionality of the new feature. * Avoid resource conflicts in integration tests by using a random dir name ([#2865](#2865)). In this release, we have implemented changes to address resource conflicts in integration tests by introducing random directory names. The `save_locations` method in `conftest.py` has been updated to generate random directory names using the `tempfile.mkdtemp` function, based on the value of the new `make_random` parameter. Additionally, in the `test_migrate.py` file located in the `tests/integration/hive_metastore` directory, the hard-coded directory name has been replaced with a random one generated by the `make_random` function, which is used when creating external tables and specifying the external delta location. Lastly, the `test_move_tables_table_properties_mismatch_preserves_original` function in `test_table_move.py` has been updated to include a randomly generated directory name in the table's external delta and storage location, ensuring that tests can run concurrently without conflicting with each other. These changes resolve the issue described in [#2797](#2797) and improve the reliability of integration tests. * Exclude dfsas from used tables ([#2841](#2841)). In this release, we've made significant improvements to the accuracy of table identification and handling in our system. We've excluded certain direct filesystem access patterns from being treated as tables in the current implementation, correcting a previous error. The `collect_tables` method has been updated to exclude table names matching defined direct filesystem access patterns. Additionally, we've added a new method `TableInfoNode` to wrap used tables and the nodes that use them. We've also introduced changes to handle direct filesystem access patterns more accurately, ensuring that the DataFrame API's `spark.table()` function is identified correctly, while the `spark.read.parquet()` function, representing direct filesystem access, is now ignored. These changes are supported by new unit tests to ensure correctness and reliability, enhancing the overall functionality and behavior of the system. * Fixed known matches false postives for libraries starting with the same name as a library in the known.json ([#2860](#2860)). This commit addresses an issue of false positives in known matches for libraries that have the same name as a library in the known.json file. The `module_compatibility` function in the `known.py` file was updated to look for exact matches or parent module matches, rather than just matches at the beginning of the name. This more nuanced approach ensures that libraries with similar names are not incorrectly flagged as having compatibility issues. Additionally, the `known.json` file is now sorted when constructing module problems, indicating that the order of the entries in this file may have been relevant to the issue being resolved. To ensure the accuracy of the changes, new unit tests were added. The test suite was expanded to include tests for known and unknown compatibility, and a new load test was added for the known.json file. These changes improve the reliability of the known matches feature, which is critical for ensuring the correct identification of compatibility issues. * Make delta format case sensitive ([#2861](#2861)). In this commit, the delta format is made case sensitive to enhance the robustness and reliability of the code. The `TableInMount` class has been updated with a `__post_init__` method to convert the `format` attribute to uppercase, ensuring case sensitivity. Additionally, the `Table` class in the `tables.py` file has been modified to include a `__post_init__` method that converts the `table_format` attribute to uppercase during object creation, making format comparisons case insensitive. New properties, `is_delta` and `is_hive`, have been added to the `Table` class to check if the table format is delta or hive, respectively. These changes affect the `what` method of the `AclMigrationWhat` enum class, which now checks for `is_delta` and `is_hive` instead of comparing `table_format` with `DELTA` and "HIVE". Relevant issues [#2858](#2858) and [#2840](#2840) have been addressed, and unit tests have been included to verify the behavior. However, the changes have not been verified on the staging environment yet. * Make delta format case sensitive ([#2862](#2862)). The recent update, derived from the resolution of issue [#2861](#2861), introduces a case-sensitive delta format to our open-source library, enhancing the precision of delta table tracking. This change impacts all table format-related code and is accompanied by additional tests for robustness. A new `location` column has been incorporated into the `table_estimates` view, facilitating the determination of delta table location. Furthermore, a new method has been implemented to extract the `location` column from the `table_estimates` view, further refining the project's functionality and accuracy in managing delta tables. * Verify UCX catalog is accessible at start of `migration-progress-experimental` workflow ([#2851](#2851)). In this release, we have introduced a new `verify_has_ucx_catalog` method in the `Application` class of the `databricks.labs.ucx.contexts` module, which checks for the presence of a UCX catalog in the workspace and returns an instance of the `VerifyHasCatalog` class. This method is used in the `migration-progress-experimental` workflow to verify UCX catalog accessibility, addressing issues [#2577](#2577) and [#2848](#2848) and progressing work on [#2816](#2816). The `verify_has_ucx_catalog` method is decorated with `@cached_property` and takes `workspace_client` and `ucx_catalog` as arguments. Additionally, we have added a new `VerifyHasCatalog` class that checks if a specified Unity Catalog (UC) catalog exists in the workspace and updated the import statement to include a `NotFound` exception. We have also added a timeout parameter to the `validate_step` function in the `workflows.py` file, modified the `migration-progress-experimental` workflow to include a new step `verify_prerequisites` in the `table_migration` job cluster, and added unit tests to ensure the proper functioning of these changes. These updates improve the application's ability to interact with UCX catalogs and ensure their presence and accessibility during workflow execution, while also enhancing the robustness and reliability of the `migration-progress-experimental` workflow.
Changes
export
CLI functionTests