-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 #2747
Merged
nfx
merged 1 commit into
main
from
dependabot/pip/databricks-labs-blueprint-gte-0.8-and-lt-0.10
Sep 25, 2024
Merged
Update databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 #2747
nfx
merged 1 commit into
main
from
dependabot/pip/databricks-labs-blueprint-gte-0.8-and-lt-0.10
Sep 25, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version. - [Release notes](https://github.com/databrickslabs/blueprint/releases) - [Changelog](https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md) - [Commits](databrickslabs/blueprint@v0.8.0...v0.9.0) --- updated-dependencies: - dependency-name: databricks-labs-blueprint dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
dependabot
bot
added
dependencies
Pull requests that update a dependency file
python
Pull requests that update Python code
labels
Sep 25, 2024
nfx
approved these changes
Sep 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
nfx
deleted the
dependabot/pip/databricks-labs-blueprint-gte-0.8-and-lt-0.10
branch
September 25, 2024 13:13
jgarciaf106
pushed a commit
to rportilla-databricks/ucx
that referenced
this pull request
Sep 26, 2024
…,<0.10 (databrickslabs#2747) Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's changelog</a>.</em></p> <blockquote> <h2>0.9.0</h2> <ul> <li>Added Databricks CLI version as part of routed command telemetry (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/147">#147</a>). A new environment variable, "DATABRICKS_CLI_VERSION", has been introduced in the Databricks CLI version for routed command telemetry. This variable is incorporated into the <code>with_user_agent_extra</code> method, which adds it to the user agent for outgoing requests, thereby enhancing detailed tracking and version identification in telemetry data. The <code>with_user_agent_extra</code> method is invoked twice, with the <code>blueprint</code> prefix and the <strong>version</strong> variable, followed by the <code>cli</code> prefix and the DATABRICKS_CLI_VERSION environment variable, ensuring that both the blueprint and CLI versions are transmitted in the user agent for all requests.</li> </ul> <h2>0.8.3</h2> <ul> <li>add missing stat() methods to DBFSPath and WorkspacePath (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/144">#144</a>). The <code>stat()</code> method has been added to both <code>DBFSPath</code> and <code>WorkspacePath</code> classes, addressing issues <a href="https://redirect.github.com/databrickslabs/blueprint/issues/142">#142</a> and <a href="https://redirect.github.com/databrickslabs/blueprint/issues/143">#143</a>. This method, which adheres to the Posix standard, returns file status in the <code>os.stat_result</code> format, providing access to various metadata attributes such as file size, last modification time, and creation time. By incorporating this method, developers can now obtain essential file information for Databricks File System (DBFS) and Databricks Workspace paths when working with these classes. The change includes a new test case for <code>stat()</code> in the <code>test_paths.py</code> file to ensure the correctness of the method for both classes.</li> </ul> <h2>0.8.2</h2> <ul> <li>Make hatch a prerequisite (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/137">#137</a>). In version 1.9.4, hatch has become a prerequisite for installation in the GitHub workflow for the project's main branch, due to occasional failures in <code>pip install hatch</code> that depend on the local environment. This change, which includes defining the hatch version as an environment variable and adding a new step for installing hatch with a specific version, aims to enhance the reliability of the build and testing process by eliminating potential installation issues with hatch. Users should install hatch manually before executing the Makefile, as the line <code>pip install hatch</code> has been removed from the Makefile. This change aligns with the approach taken for ucx, and users are expected to understand the requirement to install prerequisites before executing the Makefile. To contribute to this project, please install hatch using <code>pip install hatch</code>, clone the GitHub repository, and run <code>make dev</code> to start the development environment and install necessary dependencies.</li> <li>support files with unicode BOM (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/138">#138</a>). The recent change to the open-source library introduces support for handling files with a Unicode Byte Order Mark (BOM) during file upload and download operations in Databricks Workspace. This new functionality, added to the <code>WorkspacePath</code> class, allows for easier reading of text from files with the addition of a <code>read_text</code> method. When downloading a file, if it starts with a BOM, it will be detected and used for decoding, regardless of the preferred encoding based on the system's locale. The change includes a new test function that verifies the accurate encoding and decoding of files with different types of BOM using the appropriate encoding. Despite the inability to test Databrick notebooks with a BOM due to the Databricks platform modifying the uploaded data, this change enhances support for handling files with various encodings and BOM, improving compatibility with a broader range of file formats, and ensuring more accurate handling of files with BOM.</li> </ul> <h2>0.8.1</h2> <ul> <li>Fixed py3.10 compatibility for <code>_parts</code> in pathlike (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/135">#135</a>). The recent update to our open-source library addresses the compatibility issue with Python 3.10 in the <code>_parts</code> property of a certain type. Prior to this change, there was also a <code>_cparts</code> property that returned the same value as <code>_parts</code>, which has been removed and replaced with a direct reference to <code>_parts</code>. The <code>_parts</code> property can now be accessed via reverse equality comparison, and this change has been implemented in the <code>joinpath</code> and <code>__truediv__</code> methods as well. This enhancement improves the library's compatibility with Python 3.10 and beyond, ensuring continued functionality and stability for software engineers working with the latest Python versions.</li> </ul> <h2>0.8.0</h2> <ul> <li>Added <code>DBFSPath</code> as <code>os.PathLike</code> implementation (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/131">#131</a>). The open-source library has been updated with a new class <code>DBFSPath</code>, an implementation of <code>os.PathLike</code> for Databricks File System (DBFS) paths. This new class extends the existing <code>WorkspacePath</code> support and provides pathlib-like functionality for DBFS paths, including methods for creating directories, renaming and deleting files and directories, and reading and writing files. The addition of <code>DBFSPath</code> includes type-hinting for improved code linting and is integrated in the test suite with new and updated tests for path-like objects. The behavior of the <code>exists</code> and <code>unlink</code> methods have been updated for <code>WorkspacePath</code> to improve performance and raise appropriate errors.</li> <li>Fixed <code>.as_uri()</code> and <code>.absolute()</code> implementations for <code>WorkspacePath</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/127">#127</a>). In this release, the <code>WorkspacePath</code> class in the <code>paths.py</code> module has been updated with several improvements to the <code>.as_uri()</code> and <code>.absolute()</code> methods. These methods now utilize PathLib internals, providing better cross-version compatibility. The <code>.as_uri()</code> method now uses an f-string for concatenation and returns the UTF-8 encoded string representation of the <code>WorkspacePath</code> object via a new <code>__bytes__()</code> dunder method. Additionally, the <code>.absolute()</code> method has been implemented for the trivial (no-op) case and now supports returning the absolute path of files or directories in Databricks Workspace. Furthermore, the <code>glob()</code> and <code>rglob()</code> methods have been enhanced to support case-sensitive pattern matching based on a new <code>case_sensitive</code> parameter. To ensure the integrity of these changes, two new test cases, <code>test_as_uri()</code> and <code>test_absolute()</code>, have been added, thoroughly testing the functionality of these methods.</li> <li>Fixed <code>WorkspacePath</code> support for python 3.11 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/121">#121</a>). The <code>WorkspacePath</code> class in our open-source library has been updated to improve compatibility with Python 3.11. The <code>.expanduser()</code> and <code>.glob()</code> methods have been modified to address internal changes in Python 3.11. The <code>is_dir()</code> and <code>is_file()</code> methods now include a <code>follow_symlinks</code> parameter, although it is not currently used. A new method, <code>_scandir()</code>, has been added for compatibility with Python 3.11. The <code>expanduser()</code> method has also been updated to expand <code>~</code> (but not <code>~user</code>) constructs. Additionally, a new method <code>is_notebook()</code> has been introduced to check if the path points to a notebook in Databricks Workspace. These changes aim to ensure that the library functions smoothly with the latest version of Python and provides additional functionality for users working with Databricks Workspace.</li> <li>Properly verify versions of python (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/118">#118</a>). In this release, we have made significant updates to the pyproject.toml file to enhance project dependency and development environment management. We have added several new packages to the <code>dependencies</code> section to expand the library's functionality and compatibility. Additionally, we have removed the <code>python</code> field, as it is no longer necessary. We have also updated the <code>path</code> field to specify the location of the virtual environment, which can improve integration with popular development tools such as Visual Studio Code and PyCharm. These changes are intended to streamline the development process and make it easier to manage dependencies and set up the development environment.</li> <li>Type annotations on path-related unit tests (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/128">#128</a>). In this open-source library update, type annotations have been added to path-related unit tests to enhance code clarity and maintainability. The tests encompass various scenarios, including verifying if a path exists, creating, removing, and checking directories, and testing file attributes such as distinguishing directories, notebooks, and regular files. The additions also cover functionality for opening and manipulating files in different modes like read binary, write binary, read text, and write text. Furthermore, tests for checking file permissions, handling errors, and globbing (pattern-based file path matching) have been incorporated. The tests interact with a WorkspaceClient mock object, simulating file system interactions. This enhancement bolsters the library's reliability and assists developers in creating robust, well-documented code when working with file system paths.</li> <li>Updated <code>WorkspacePath</code> to support Python 3.12 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/122">#122</a>). In this release, the <code>WorkspacePath</code> implementation has been updated to ensure compatibility with Python 3.12, in addition to Python 3.10 and 3.11. The class was modified to replace most of the internal implementation and add extensive tests for public interfaces, ensuring that the superclass implementations are not used unless they are known to be safe. This change is in response to the significant changes in the superclass implementations between Python 3.11 and 3.12, which were found to be incompatible with each other. The <code>WorkspacePath</code> class now includes several new methods and tests to ensure that it functions seamlessly with different versions of Python. These changes include testing for initialization, equality, hash, comparison, path components, and various path manipulations. This update enhances the library's adaptability and ensures it functions correctly with different versions of Python. Classifiers have also been updated to include support for Python 3.12.</li> <li><code>WorkspacePath</code> fixes for the <code>.resolve()</code> implementation (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/129">#129</a>). The <code>.resolve()</code> method for <code>WorkspacePath</code> has been updated to improve its handling of relative paths and the <code>strict</code> argument. Previously, relative paths were not properly validated and would be returned as-is. Now, relative paths will cause the method to fail. The <code>strict</code> argument is now checked, and if set to <code>True</code> and the path does not exist, a <code>FileNotFoundError</code> will be raised. The method <code>.absolute()</code> is used to obtain the absolute path of the file or directory in Databricks Workspace and is used in the implementation of <code>.resolve()</code>. A new test, <code>test_resolve()</code>, has been added to verify these changes, covering scenarios where the path is absolute, the path exists, the path does not exist, and the path is relative. In the case of relative paths, a <code>NotImplementedError</code> is raised, as <code>.resolve()</code> is not supported for them.</li> <li><code>WorkspacePath</code>: Fix the .rename() and .replace() implementations to return the target path (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/130">#130</a>). The <code>.rename()</code> and <code>.replace()</code> methods of the <code>WorkspacePath</code> class have been updated to return the target path as part of the public API, with <code>.rename()</code> no longer accepting the <code>overwrite</code> keyword argument and always failing if the target path already exists. A new private method, <code>._rename()</code>, has been added to include the <code>overwrite</code> argument and is used by both <code>.rename()</code> and <code>.replace()</code>. This update is a preparatory step for factoring out common code to support DBFS paths. The tests have been updated accordingly, combining and adding functions to test the new and updated methods. The <code>.unlink()</code> method's behavior remains unchanged. Please note that the exact error raised when <code>.rename()</code> fails due to an existing target path is yet to be defined.</li> </ul> <p>Dependency updates:</p> <ul> <li>Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/133">#133</a>).</li> </ul> <h2>0.7.0</h2> <ul> <li>Added <code>databricks.labs.blueprint.paths.WorkspacePath</code> as <code>pathlib.Path</code> equivalent (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/115">#115</a>). This commit introduces the <code>databricks.labs.blueprint.paths.WorkspacePath</code> library, providing Python-native <code>pathlib.Path</code>-like interfaces to simplify working with Databricks Workspace paths. The library includes <code>WorkspacePath</code> and <code>WorkspacePathDuringTest</code> classes offering advanced functionality for handling user home folders, relative file paths, browser URLs, and file manipulation methods such as <code>read/write_text()</code>, <code>read/write_bytes()</code>, and <code>glob()</code>. This addition brings enhanced, Pythonic ways to interact with Databricks Workspace paths, including creating and moving files, managing directories, and generating browser-accessible URIs. Additionally, the commit includes updates to existing methods and introduces new fixtures for creating notebooks, accompanied by extensive unit tests to ensure reliability and functionality.</li> <li>Added propagation of <code>blueprint</code> version into <code>User-Agent</code> header when it is used as library (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/114">#114</a>). A new feature has been introduced in the library that allows for the propagation of the <code>blueprint</code> version and the name of the command line interface (CLI) command used in the <code>User-Agent</code> header when the library is utilized as a library. This feature includes the addition of two new pairs of <code>OtherInfo</code>: <code>blueprint/X.Y.Z</code> to indicate that the request is made using the <code>blueprint</code> library and <code>cmd/<name></code> to store the name of the CLI command used for making the request. The implementation involves using the <code>with_user_agent_extra</code> function from <code>databricks.sdk.config</code> to set the user agent consistently with the Databricks CLI. Several changes have been made to the test file for <code>test_useragent.py</code> to include a new test case, <code>test_user_agent_is_propagated</code>, which checks if the <code>blueprint</code> version and the name of the command are correctly propagated to the <code>User-Agent</code> header. A context manager <code>http_fixture_server</code> has been added that creates an HTTP server with a custom handler, which extracts the <code>blueprint</code> version and the command name from the <code>User-Agent</code> header and stores them in the <code>user_agent</code> dictionary. The test case calls the <code>foo</code> command with a mocked <code>WorkspaceClient</code> instance and sets the <code>DATABRICKS_HOST</code> and <code>DATABRICKS_TOKEN</code> environment variables to test the propagation of the <code>blueprint</code> version and the command name in the <code>User-Agent</code> header. The test case then asserts that the <code>blueprint</code> version and the name of the command are present and correctly set in the <code>user_agent</code> dictionary.</li> <li>Bump actions/checkout from 4.1.6 to 4.1.7 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/112">#112</a>). In this release, the version of the "actions/checkout" action used in the <code>Checkout Code</code> step of the acceptance workflow has been updated from 4.1.6 to 4.1.7. This update may include bug fixes, performance improvements, and new features, although specific changes are not mentioned in the commit message. The <code>Unshallow</code> step remains unchanged, continuing to fetch and clean up the repository's history. This update ensures that the latest enhancements from the "actions/checkout" action are utilized, aiming to improve the reliability and performance of the code checkout process in the GitHub Actions workflow. Software engineers should be aware of this update and its potential impact on their workflows.</li> </ul> <p>Dependency updates:</p> <ul> <li>Bump actions/checkout from 4.1.6 to 4.1.7 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/112">#112</a>).</li> </ul> <h2>0.6.3</h2> <ul> <li>fixed <code>Command.get_argument_type</code> bug with <code>UnionType</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/110">#110</a>). In this release, the <code>Command.get_argument_type</code> method has been updated to include special handling for <code>UnionType</code>, resolving a bug that caused the function to crash when encountering this type. The method now returns the string representation of the annotation if the argument is a <code>UnionType</code>, providing more accurate and reliable results. To facilitate this, modifications were made using the <code>types</code> module. Additionally, the <code>foo</code> function has a new optional argument <code>optional_arg</code> of type <code>str</code>, with a default value of <code>None</code>. This argument is passed to the <code>some</code> function in the assertion. The <code>Prompts</code> type has been added to the <code>foo</code> function signature, and an assertion has been added to verify if <code>prompts</code> is an instance of <code>Prompts</code>. Lastly, the default value of the <code>address</code> argument has been changed from an empty string to "default", and the same changes have been applied to the <code>test_injects_prompts</code> test function.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/databrickslabs/blueprint/commit/c3b53a48471b7c3d9ff911d7c6cc3921d6dd9846"><code>c3b53a4</code></a> Release v0.9.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/148">#148</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/98c5f305e721b7f9b2db88db7ad062481e4191dd"><code>98c5f30</code></a> Added Databricks CLI version as part of routed command telemetry (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/147">#147</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/2bfbf1801c1f8638dfadd5c072daeb4cbb9fa372"><code>2bfbf18</code></a> Release v0.8.3 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/145">#145</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/36fc873c0f9795e293d4716916fb96ed31680240"><code>36fc873</code></a> add missing stat() methods to DBFSPath and WorkspacePath (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/144">#144</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c531c3f5627d15057d0ae6e570140dedbed968ef"><code>c531c3f</code></a> Release v0.8.2 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/139">#139</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/53b94634639e673eeac880b005e9e64981259035"><code>53b9463</code></a> support files with unicode BOM (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/138">#138</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/ec8232664d4f45e2233404c7ab414d7c3393db1e"><code>ec82326</code></a> Make hatch a prerequisite (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/137">#137</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/98e75bcffd72dd3075a772947e0d06042ba81f6a"><code>98e75bc</code></a> Release v0.8.1 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/136">#136</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/821bc0adb4438e211586af91d3ea011bf97115cf"><code>821bc0a</code></a> Fixed py3.10 compatibility for <code>_parts</code> in pathlike (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/135">#135</a>)</li> <li>See full diff in <a href="https://github.com/databrickslabs/blueprint/compare/v0.8.0...v0.9.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Merged
nfx
added a commit
that referenced
this pull request
Sep 26, 2024
* Added Py4j implementation of tables crawler to retrieve a list of HMS tables in the assessment workflow ([#2579](#2579)). In this release, we have added a Py4j implementation of a tables crawler to retrieve a list of Hive Metastore tables in the assessment workflow. A new `FasterTableScanCrawler` class has been introduced, which can be used in the Assessment Job based on a feature flag to replace the old Scala code, allowing for better logging during table scans. The existing `assessment.crawl_tables` workflow now utilizes the new py4j crawler instead of the scala one. Integration tests have been added to ensure the functionality works correctly. The commit also includes a new method for listing table names in the specified database and improvements to error handling and logging mechanisms. The new Py4j tables crawler enhances the functionality of the assessment workflow by improving error handling, resulting in better logging and faster table scanning during the assessment process. This change is part of addressing issue [#2190](#2190) and was co-authored by Serge Smertin. * Added `create-ucx-catalog` cli command ([#2694](#2694)). A new CLI command, `create-ucx-catalog`, has been added to create a catalog for migration tracking that can be used across multiple workspaces. The command creates a UCX catalog for tracking migration status and artifacts, and is created by running `databricks labs ucx create-ucx-catalog` and specifying the storage location for the catalog. Relevant user documentation, unit tests, and integration tests have been added for this command. The `assign-metastore` command has also been updated to allow for the selection of a metastore when multiple metastores are available in the workspace region. This change improves the migration tracking feature and enhances the user experience. * Added experimental `migration-progress-experimental` workflow ([#2658](#2658)). This commit introduces an experimental workflow, `migration-progress-experimental`, which refreshes the inventory for various resources such as clusters, grants, jobs, pipelines, policies, tables, TableMigrationStatus, and UDFs. The workflow can be triggered using the `databricks labs ucx migration-progress` CLI command and uses a new implementation of a Scala-based crawler, `TablesCrawler`, which will eventually replace the current implementation. The new workflow is a duplicate of most of the `assessment` pipeline's functionality but with some differences, such as the use of `TablesCrawler`. Relevant user documentation has been added, along with unit tests, integration tests, and a screenshot of a successful staging environment run. The new workflow is expected to run on a schedule in the future. This change resolves [#2574](#2574) and progresses [#2074](#2074). * Added handling for `InternalError` in `Listing.__iter__` ([#2697](#2697)). This release introduces improved error handling in the `Listing.__iter__` method of the `Generic` class, located in the `workspace_access/generic.py` file. Previously, only `NotFound` exceptions were handled, but now both `InternalError` and `NotFound` exceptions are caught and logged appropriately. This change enhances the robustness of the method, which is responsible for listing objects of a specific type and returning them as `GenericPermissionsInfo` objects. To ensure the correct functionality, we have added new unit tests and manual testing. The logging of the `InternalError` exception is properly handled in the `GenericPermissionsSupport` class when listing serving endpoints. This behavior is verified by the newly added test function `test_internal_error_in_serving_endpoints_raises_warning` and the updated `test_serving_endpoints_not_enabled_raises_warning`. * Added handling for `PermissionDenied` when listing accessible workspaces ([#2733](#2733)). A new `can_administer` method has been added to the `Workspaces` class in the `workspaces.py` file, which allows for more fine-grained control over which users can administer workspaces. This method checks if the user has access to a given workspace and is a member of the workspace's `admins` group, indicating that the user has administrative privileges for that workspace. If the user does not have access to the workspace or is not a member of the `admins` group, the method returns `False`. Additionally, error handling in the `get_accessible_workspaces` method has been improved by adding a `PermissionDenied` exception to the list of exceptions that are caught and logged. New unit tests have been added for the `AccountWorkspaces` class of the `databricks.labs.blueprint.account` module to ensure that the new method is functioning as intended, specifically checking if a user is a workspace administrator based on whether they belong to the `admins` group. The linked issue [#2732](#2732) is resolved by this change. All changes have been manually and unit tested. * Added static code analysis results to assessment dashboard ([#2696](#2696)). This commit introduces two new tasks, `assess_dashboards` and `assess_workflows`, to the existing assessment dashboard for identifying migration problems in dashboards and workflows. These tasks analyze embedded queries and notebooks for migration issues and collect direct filesystem access patterns requiring attention. Upon completion, the results are stored in the inventory database and displayed on the Migration dashboard. Additionally, two new widgets, job/query problem widgets and directfs access widgets, have been added to enhance the dashboard's functionality by providing additional information related to code compatibility and access control. Integration tests using mock data have been added and manually tested to ensure the proper functionality of these new features. This update improves the overall assessment and compatibility checking capabilities of the dashboard, making it easier for users to identify and address issues related to Unity Catalog compatibility in their workflows and dashboards. * Added unskip CLI command to undo a skip on schema or a table ([#2727](#2727)). This pull request introduces a new CLI command, "unskip", which allows users to reverse a previously applied `skip` on a schema or table. The `unskip` command accepts a required `--schema` parameter and an optional `--table` parameter. A new function, also named "unskip", has been added, which takes the same parameters as the `skip` command. The function checks for the required `--schema` parameter and creates a new WorkspaceContext object to call the appropriate method on the table_mapping object. Two new methods, `unskip_schema` and "unskip_table_or_view", have been added to the HiveMapping class. These methods remove the skip mark from a schema or table, respectively, and handle exceptions such as NotFound and BadRequest. The get_tables_to_migrate method has been updated to consider the unskipped tables or schemas. Currently, the feature is tested manually and has not been added to the user documentation. * Added unskip CLI command to undo a skip on schema or a table ([#2734](#2734)). A new `unskip` CLI command has been added to the project, which allows users to remove the `skip` mark set by the existing `skip` command on a specified schema or table. This command takes an optional `--table` flag, and if not provided, it will unskip the entire schema. The new functionality is accompanied by a unit test and relevant user documentation, and addresses issue [#1938](#1938). The implementation includes the addition of the `unskip_table_or_view` method, which generates the appropriate `ALTER TABLE/VIEW` statement to remove the skip marker, and updates to the `unskip_schema` method to include the schema name in the `ALTER SCHEMA` statement. Additionally, exception handling has been updated to include `NotFound` and `BadRequest` exceptions. This feature simplifies the process of undoing a skip on a schema, table, or view in the Hive metastore, which previously required manual editing of the Hive metastore properties. * Assess source code as part of the assessment ([#2678](#2678)). This commit introduces enhancements to the assessment workflow, including the addition of two new tasks for evaluating source code from SQL queries in dashboards and from notebooks/files in jobs and tasks. The existing `databricks labs install ucx` command has been modified to incorporate linting during the assessment. The `QueryLinter` class has been updated to accept an additional argument for linting source code. These changes have been thoroughly tested through integration tests to ensure proper functionality. Co-authored by Eric Vergnaud. * Bump astroid version, pylint version and drop our f-string workaround ([#2746](#2746)). In this update, we have bumped the versions of astroid and pylint to 3.3.1 and removed workarounds related to f-string inference limitations in previous versions of astroid (< 3.3). These workarounds were necessary for handling issues such as uninferrable sys.path values and the lack of f-string inference in loops. We have also updated corresponding tests to reflect these changes and improve the overall code quality and maintainability of the project. These changes are part of a larger effort to update dependencies and simplify the codebase by leveraging the latest features of updated tools and removing obsolete workarounds. * Delete temporary files when running solacc ([#2750](#2750)). This commit includes changes to the `solacc.py` script to improve the linting process for the `solacc` repository, specifically targeting the issue of excessive temporary files that were exceeding CI storage capacity. The modifications include linting the repository on a per-top-level `solution` basis, where each solution resides within the top folders and is independent of others. Post-linting, temporary files and directories registered in `PathLookup` are deleted to enhance storage efficiency. Additionally, this commit prepares for improving false positive detection and introduces a new `SolaccContext` class that tracks various aspects of the linting process, providing more detailed feedback on the linting results. This change does not introduce new functionality or modify existing functionality, but rather optimizes the linting process for the `solacc` repository, maintaining CI storage capacity levels within acceptable limits. * Don't report direct filesystem access for API calls ([#2689](#2689)). This release introduces enhancements to the Direct File System Access (DFSA) linter, resolving false positives in API call reporting. The `ws.api_client.do` call previously triggered inaccurate direct filesystem access alerts, which have been addressed by adding new methods to identify HTTP call parameters and specific API calls. The linter now disregards DFSA patterns within known API calls, eliminating false positives with relative URLs and duplicate advice from SparkSqlPyLinter. Additionally, improvements in the `python_ast.py` and `python_infer.py` files include the addition of `is_instance_of` and `is_from_module` methods, along with safer inference methods to prevent infinite recursion and enhance value inference. These changes significantly improve the DFSA linter's accuracy and effectiveness when analyzing code containing API calls. * Enables cli cmd `databricks labs ucx create-catalog-schemas` to apply catalog/schema acl from legacy hive_metastore ([#2676](#2676)). The new release introduces a `databricks labs ucx create-catalog-schemas` command, which applies catalog/schema Access Control List (ACL) from a legacy hive_metastore. This command modifies the existing `table_mapping` method to include a new `grants_crawler` parameter in the `CatalogSchema` constructor, enabling the application of ACLs from the legacy hive_metastore. A corresponding unit test is included to ensure proper functionality. The `CatalogSchema` class in the `databricks.labs.ucx.hive_metastore.catalog_schema` module has been updated with a new argument `hive_acl` and the integration of the `GrantsCrawler` class. The `GrantsCrawler` class is responsible for crawling the Hive metastore and retrieving grants for catalogs, schemas, and tables. The `prepare_test` function has been updated to include the `hive_acl` argument and the `test_catalog_schema_acl` function has been updated to test the new functionality, ensuring that the correct grant statements are generated for a wider range of principals and catalogs/schemas. These changes improve the functionality and usability of the `databricks labs ucx create-catalog-schemas` command, allowing for a more seamless transition from a legacy hive metastore. * Fail `make test` on coverage below 90% ([#2682](#2682)). A new change has been introduced to the pyproject.toml file to enhance the codebase's quality and robustness by ensuring that the test coverage remains above 90%. This has been accomplished by adding the `--cov-fail-under=90` flag to the `test` and `coverage` scripts in the `[tool.hatch.envs.default.scripts]` section. This flag will cause the `make test` command to fail if the coverage percentage falls below the specified value of 90%, ensuring that all new changes are thoroughly tested and that the codebase maintains a minimum coverage threshold. This is a best practice for maintaining code coverage and improving the overall quality and reliability of the codebase. * Fixed DFSA false positives from f-string fragments ([#2679](#2679)). This commit addresses false positive DataFrame API Scanning Antipattern (DFSA) reports in Python code, specifically in f-string fragments containing forward slashes and curly braces. The linter has been updated to accurately detect DFSA paths while avoiding false positives, and it now checks for `JoinedStr` fragments in string constants. Additionally, the commit rectifies issues with duplicate advices reported by `SparkSqlPyLinter`. No new features or major functionality changes have been introduced; instead, the focus has been on improving the reliability and accuracy of DFSA detection. Co-authored by Eric Vergnaud, this commit includes new unit tests and refinements to the DFSA linter, specifically addressing false positive patterns like `f"/Repos/{thing1}/sdk-{thing2}-{thing3}"`. To review these changes, consult the updated tests in the `tests/unit/source_code/linters/test_directfs.py` file, such as the new test case for the f-string pattern causing false positives. By understanding these improvements, you'll ensure your project adheres to the latest updates, maintaining quality and accurate DFSA detection. * Fixed failing integration tests that perform a real assessment ([#2736](#2736)). In this release, we have made significant improvements to the integration tests in the `assessment` workflow, by reducing the scope of the assessment and improving efficiency and reliability. We have removed several object creation functions and added a new function `populate_for_linting` for linting purposes. The `populate_for_linting` function adds necessary information to the installation context, and is used to ensure that the integration tests still have the required data for linting. We have also added a pytest fixture `populate_for_linting` to set up a minimal amount of data in the workspace for linting purposes. These changes have been implemented in the `test_workflows.py` file in the integration/assessment directory. This will help to ensure that the tests are not unnecessarily extensive, and that they are able to accurately assess the functionality of the library. * Fixed sqlglot crasher with 'drop schema ...' statement ([#2758](#2758)). In this release, we have addressed a crash issue in the `sqlglot` library caused by the `drop schema` statement. A new method, `_unsafe_lint_expression`, has been introduced to prevent the crash by checking if the current expression is a `Use`, `Create`, or `Drop` statement and updating the `schema` attribute accordingly. The library now correctly handles the `drop schema` statement and returns a `Deprecation` warning if the table being processed is in the `hive_metastore` catalog and has been migrated to the Unity Catalog. Unit tests have been added to ensure the correct behavior of this code, and the linter for `from table` SQL has been updated to parse and handle the `drop schema` statement without raising any errors. These changes improve the library's overall reliability and stability, allowing it to operate smoothly with the `drop schema` statement. * Fixed test failure: `test_table_migration_job_refreshes_migration_status[regular-migrate-tables]` ([#2625](#2625)). In this release, we have addressed two issues ([#2621](#2621) and [#2537](#2537)) and fixed a test failure in `test_table_migration_job_refreshes_migration_status[regular-migrate-tables]`. The `index` and `index_full_refresh` methods in `table_migrate.py` have been updated to accept a new `force_refresh` flag. When set to `True`, these methods will ensure that the migration status is up-to-date. This change also affects the `ViewsMigrationSequencer` class, which now passes `force_refresh=True` to the `index` method. Additionally, we have fixed a test failure by reusing the `force_refresh` flag to ensure the migration status is up-to-date. The `TableMigrationStatus` class in `table_migration_status.py` has been modified to accept an optional `force_refresh` parameter in the `index` method, and a unit test has been updated to assert the correct behavior when updating the migration status. * Fixes error message ([#2759](#2759)). The `load` method of the `mapping.py` file in the `databricks/labs/ucx/hive_metastore` package has been updated to correct an error message displayed when a `NotFound` exception is raised. The previous message suggested running an incorrect command, which has been updated to the correct one: "Please run: databricks labs ucx create-table-mapping". This change does not add any new methods or alter existing functionality, but instead focuses on improving the user experience by providing accurate information when an error occurs. The scope of this change is limited to updating the error message, and no other modifications have been made. * Fixes issue of circular dependency of migrate-location ACL ([#2741](#2741)). In this release, we have resolved two issues ([#274](#274) * Fixes source table alias dissapearance during migrate_views ([#2726](#2726)). This release introduces a fix to preserve the alias for the source table during the conversion of CREATE VIEW SQL from the legacy Hive metastore to the Unity Catalog. The issue was addressed by adding a new test case, `test_migrate_view_alias_test`, to verify the correct handling of table aliases during migration. The changes also include a fix for the SQL conversion and new test cases to ensure the correct handling of table aliases, reflected in accurate SQL conversion. A new parameter, `alias`, has been added to the Table class, and the `apply` method in the `from_table.py` file has been updated. The migration process has been updated to retain the original alias of the table. Unit tests have been added and thoroughly tested to confirm the correctness of the changes, including handling potential intermittent failures caused by external dependencies. * Py4j table crawler: suggestions/fixes for describing tables ([#2684](#2684)). This release introduces significant improvements and fixes to the Py4J-based table crawler, enhancing its capability to describe tables effectively. The code for fetching table properties over the bridge has been updated, and error tracing has been improved through individual fetching of each table property and providing python backtrace on JVM side errors. Scala `Option` values unboxing issues have been resolved, and a small optimization has been implemented to detect partitioned tables without materializing the collection. The table's `.viewText()` property is now properly handled as a Scala `Option`. The `catalog` argument is now explicitly verified to be `hive_metastore`, and a new static method `_option_as_python` has been introduced for safely extracting values from Scala `Option`. The `_describe` method has been refactored to handle exceptions more gracefully and improved code readability. These changes result in better functionality, error handling, logging, and performance when describing tables within a specified catalog and database. The linked issues [#2658](#2658) and [#2579](#2579) are progressed through these updates, and appropriate testing has been conducted to ensure the improvements' effectiveness. * Speedup assessment workflow by making DBFS root table size calculation parallel ([#2745](#2745)). In this release, the assessment workflow for calculating DBFS root table size has been optimized through the parallelization of the calculation process, resulting in improved performance. This has been achieved by updating the `pipelines_crawler` function in `src/databricks/labs/ucx/contexts/workflow_task.py`, specifically the `cached_property table_size_crawler`, to include an additional argument `self.config.include_databases`. The `TablesCrawler` class has also been modified to include a generic type parameter `Table`, enabling type hinting and more robust type checking. Furthermore, the unit test file `test_table_size.py` in the `hive_metastore` directory has been updated to handle corrupt tables and invalid delta format errors more effectively. Additionally, a new entry `databricks-pydabs` has been added to the "known.json" file, potentially enabling better integration with the `databricks-pydabs` library or providing necessary configuration information for parallel processing. Overall, these changes improve the efficiency and scalability of the codebase and optimize the assessment workflow for calculating DBFS root table size. * Updated databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 ([#2747](#2747)). In this update, the requirement for `databricks-labs-blueprint` has been updated to version `>=0.8,<0.10` in the `pyproject.toml` file. This change allows the project to utilize the latest features and bug fixes included in version 0.9.0 of the `databricks-labs-blueprint` library. Notable updates in version 0.9.0 consist of the addition of Databricks CLI version as part of routed command telemetry and support for Unicode Byte Order Mark (BOM) in file upload and download operations. Additionally, various bug fixes and improvements have been implemented for the `WorkspacePath` class, including the addition of `stat()` methods and improved compatibility with different versions of Python. * Updated databricks-labs-lsql requirement from <0.12,>=0.5 to >=0.5,<0.13 ([#2688](#2688)). In this update, the version requirement of the `databricks-labs-lsql` library has been changed from a version greater than or equal to 0.5 and less than 0.12 to a version greater than or equal to 0.5 and less than 0.13. This allows the project to utilize the latest version of 'databricks-labs-lsql', which includes new methods for differentiating between a table that has never been written to and one with zero rows in the MockBackend class. Additionally, the update adds support for various filter types and improves testing coverage and reliability. The release notes and changelog for the updated library are provided in the commit message for reference. * Updated documentation to explain the usage of collections and eligible commands ([#2738](#2738)). The latest update to the Databricks Labs Unified CLI (UCX) tool introduces the `join-collection` command, which enables users to join two or more workspaces into a collection, allowing for streamlined and consolidated command execution across multiple workspaces. This feature is available to Account admins on the Databricks account, Workspace admins on the workspaces to be joined, and requires UCX installation on the workspace. To run collection-eligible commands, users can simply pass the `--run-as-collection=True` flag. This enhancement enhances the UCX tool's functionality, making it easier to manage and execute commands on multiple workspaces. * Updated sqlglot requirement from <25.22,>=25.5.0 to >=25.5.0,<25.23 ([#2687](#2687)). In this pull request, we have updated the version requirement for the `sqlglot` library in the pyproject.toml file. The previous requirement specified a version greater than or equal to 25.5.0 and less than 25.22, but we have updated it to allow for versions greater than or equal to 25.5.0 and less than 25.23. This change allows us to use the latest version of 'sqlglot', while still ensuring compatibility with other dependencies. Additionally, this pull request includes a detailed changelog from the `sqlglot` repository, which provides information on the features, bug fixes, and changes included in each version. This can help us understand the scope of the update and how it may impact our project. * [DOCUMENTATION] Improve documentation on using account profile for `sync-workspace-info` cli command ([#2683](#2683)). The `sync-workspace-info` CLI command has been added to the Databricks Labs UCX package, which uploads the workspace configuration to all workspaces in the Databricks account where the `ucx` tool is installed. This feature requires Databricks Account Administrator privileges and is necessary to create an immutable default catalog mapping for the table migration process. It also serves as a prerequisite for the `create-table-mapping` command. To utilize this command, users must configure the Databricks CLI profile with access to the Databricks account console, available at "accounts.cloud.databricks.com" or "accounts.azuredatabricks.net". Additionally, the documentation for using the account profile with the `sync-workspace-info` command has been enhanced, addressing issue [#1762](#1762). * [DOCUMENTATION] Improve documentation when installing UCX from a machine with restricted internet access ([#2690](#2690)). "A new section has been added to the `ADVANCED` installation section of the UCX library documentation, providing detailed instructions for installing UCX with a company-hosted PyPI mirror. This feature is intended for environments with restricted internet access, allowing users to bypass the public PyPI index and use a company-controlled mirror instead. Users will need to add all UCX dependencies to the company-hosted PyPI mirror and set the `PIP_INDEX_URL` environment variable to the mirror URL during installation. The solution also includes a prompt asking the user if their workspace blocks internet access. Additionally, the documentation has been updated to clarify that UCX requires internet access to connect to GitHub for downloading the tool, specifying the necessary URLs that need to be accessible. This update aims to improve the installation process for users with restricted internet access and provide clear instructions and prompts for installing UCX on machines with limited internet connectivity." Dependency updates: * Updated sqlglot requirement from <25.22,>=25.5.0 to >=25.5.0,<25.23 ([#2687](#2687)). * Updated databricks-labs-lsql requirement from <0.12,>=0.5 to >=0.5,<0.13 ([#2688](#2688)). * Updated databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 ([#2747](#2747)).
nfx
added a commit
that referenced
this pull request
Sep 26, 2024
* Added Py4j implementation of tables crawler to retrieve a list of HMS tables in the assessment workflow ([#2579](#2579)). In this release, we have added a Py4j implementation of a tables crawler to retrieve a list of Hive Metastore tables in the assessment workflow. A new `FasterTableScanCrawler` class has been introduced, which can be used in the Assessment Job based on a feature flag to replace the old Scala code, allowing for better logging during table scans. The existing `assessment.crawl_tables` workflow now utilizes the new py4j crawler instead of the scala one. Integration tests have been added to ensure the functionality works correctly. The commit also includes a new method for listing table names in the specified database and improvements to error handling and logging mechanisms. The new Py4j tables crawler enhances the functionality of the assessment workflow by improving error handling, resulting in better logging and faster table scanning during the assessment process. This change is part of addressing issue [#2190](#2190) and was co-authored by Serge Smertin. * Added `create-ucx-catalog` cli command ([#2694](#2694)). A new CLI command, `create-ucx-catalog`, has been added to create a catalog for migration tracking that can be used across multiple workspaces. The command creates a UCX catalog for tracking migration status and artifacts, and is created by running `databricks labs ucx create-ucx-catalog` and specifying the storage location for the catalog. Relevant user documentation, unit tests, and integration tests have been added for this command. The `assign-metastore` command has also been updated to allow for the selection of a metastore when multiple metastores are available in the workspace region. This change improves the migration tracking feature and enhances the user experience. * Added experimental `migration-progress-experimental` workflow ([#2658](#2658)). This commit introduces an experimental workflow, `migration-progress-experimental`, which refreshes the inventory for various resources such as clusters, grants, jobs, pipelines, policies, tables, TableMigrationStatus, and UDFs. The workflow can be triggered using the `databricks labs ucx migration-progress` CLI command and uses a new implementation of a Scala-based crawler, `TablesCrawler`, which will eventually replace the current implementation. The new workflow is a duplicate of most of the `assessment` pipeline's functionality but with some differences, such as the use of `TablesCrawler`. Relevant user documentation has been added, along with unit tests, integration tests, and a screenshot of a successful staging environment run. The new workflow is expected to run on a schedule in the future. This change resolves [#2574](#2574) and progresses [#2074](#2074). * Added handling for `InternalError` in `Listing.__iter__` ([#2697](#2697)). This release introduces improved error handling in the `Listing.__iter__` method of the `Generic` class, located in the `workspace_access/generic.py` file. Previously, only `NotFound` exceptions were handled, but now both `InternalError` and `NotFound` exceptions are caught and logged appropriately. This change enhances the robustness of the method, which is responsible for listing objects of a specific type and returning them as `GenericPermissionsInfo` objects. To ensure the correct functionality, we have added new unit tests and manual testing. The logging of the `InternalError` exception is properly handled in the `GenericPermissionsSupport` class when listing serving endpoints. This behavior is verified by the newly added test function `test_internal_error_in_serving_endpoints_raises_warning` and the updated `test_serving_endpoints_not_enabled_raises_warning`. * Added handling for `PermissionDenied` when listing accessible workspaces ([#2733](#2733)). A new `can_administer` method has been added to the `Workspaces` class in the `workspaces.py` file, which allows for more fine-grained control over which users can administer workspaces. This method checks if the user has access to a given workspace and is a member of the workspace's `admins` group, indicating that the user has administrative privileges for that workspace. If the user does not have access to the workspace or is not a member of the `admins` group, the method returns `False`. Additionally, error handling in the `get_accessible_workspaces` method has been improved by adding a `PermissionDenied` exception to the list of exceptions that are caught and logged. New unit tests have been added for the `AccountWorkspaces` class of the `databricks.labs.blueprint.account` module to ensure that the new method is functioning as intended, specifically checking if a user is a workspace administrator based on whether they belong to the `admins` group. The linked issue [#2732](#2732) is resolved by this change. All changes have been manually and unit tested. * Added static code analysis results to assessment dashboard ([#2696](#2696)). This commit introduces two new tasks, `assess_dashboards` and `assess_workflows`, to the existing assessment dashboard for identifying migration problems in dashboards and workflows. These tasks analyze embedded queries and notebooks for migration issues and collect direct filesystem access patterns requiring attention. Upon completion, the results are stored in the inventory database and displayed on the Migration dashboard. Additionally, two new widgets, job/query problem widgets and directfs access widgets, have been added to enhance the dashboard's functionality by providing additional information related to code compatibility and access control. Integration tests using mock data have been added and manually tested to ensure the proper functionality of these new features. This update improves the overall assessment and compatibility checking capabilities of the dashboard, making it easier for users to identify and address issues related to Unity Catalog compatibility in their workflows and dashboards. * Added unskip CLI command to undo a skip on schema or a table ([#2727](#2727)). This pull request introduces a new CLI command, "unskip", which allows users to reverse a previously applied `skip` on a schema or table. The `unskip` command accepts a required `--schema` parameter and an optional `--table` parameter. A new function, also named "unskip", has been added, which takes the same parameters as the `skip` command. The function checks for the required `--schema` parameter and creates a new WorkspaceContext object to call the appropriate method on the table_mapping object. Two new methods, `unskip_schema` and "unskip_table_or_view", have been added to the HiveMapping class. These methods remove the skip mark from a schema or table, respectively, and handle exceptions such as NotFound and BadRequest. The get_tables_to_migrate method has been updated to consider the unskipped tables or schemas. Currently, the feature is tested manually and has not been added to the user documentation. * Added unskip CLI command to undo a skip on schema or a table ([#2734](#2734)). A new `unskip` CLI command has been added to the project, which allows users to remove the `skip` mark set by the existing `skip` command on a specified schema or table. This command takes an optional `--table` flag, and if not provided, it will unskip the entire schema. The new functionality is accompanied by a unit test and relevant user documentation, and addresses issue [#1938](#1938). The implementation includes the addition of the `unskip_table_or_view` method, which generates the appropriate `ALTER TABLE/VIEW` statement to remove the skip marker, and updates to the `unskip_schema` method to include the schema name in the `ALTER SCHEMA` statement. Additionally, exception handling has been updated to include `NotFound` and `BadRequest` exceptions. This feature simplifies the process of undoing a skip on a schema, table, or view in the Hive metastore, which previously required manual editing of the Hive metastore properties. * Assess source code as part of the assessment ([#2678](#2678)). This commit introduces enhancements to the assessment workflow, including the addition of two new tasks for evaluating source code from SQL queries in dashboards and from notebooks/files in jobs and tasks. The existing `databricks labs install ucx` command has been modified to incorporate linting during the assessment. The `QueryLinter` class has been updated to accept an additional argument for linting source code. These changes have been thoroughly tested through integration tests to ensure proper functionality. Co-authored by Eric Vergnaud. * Bump astroid version, pylint version and drop our f-string workaround ([#2746](#2746)). In this update, we have bumped the versions of astroid and pylint to 3.3.1 and removed workarounds related to f-string inference limitations in previous versions of astroid (< 3.3). These workarounds were necessary for handling issues such as uninferrable sys.path values and the lack of f-string inference in loops. We have also updated corresponding tests to reflect these changes and improve the overall code quality and maintainability of the project. These changes are part of a larger effort to update dependencies and simplify the codebase by leveraging the latest features of updated tools and removing obsolete workarounds. * Delete temporary files when running solacc ([#2750](#2750)). This commit includes changes to the `solacc.py` script to improve the linting process for the `solacc` repository, specifically targeting the issue of excessive temporary files that were exceeding CI storage capacity. The modifications include linting the repository on a per-top-level `solution` basis, where each solution resides within the top folders and is independent of others. Post-linting, temporary files and directories registered in `PathLookup` are deleted to enhance storage efficiency. Additionally, this commit prepares for improving false positive detection and introduces a new `SolaccContext` class that tracks various aspects of the linting process, providing more detailed feedback on the linting results. This change does not introduce new functionality or modify existing functionality, but rather optimizes the linting process for the `solacc` repository, maintaining CI storage capacity levels within acceptable limits. * Don't report direct filesystem access for API calls ([#2689](#2689)). This release introduces enhancements to the Direct File System Access (DFSA) linter, resolving false positives in API call reporting. The `ws.api_client.do` call previously triggered inaccurate direct filesystem access alerts, which have been addressed by adding new methods to identify HTTP call parameters and specific API calls. The linter now disregards DFSA patterns within known API calls, eliminating false positives with relative URLs and duplicate advice from SparkSqlPyLinter. Additionally, improvements in the `python_ast.py` and `python_infer.py` files include the addition of `is_instance_of` and `is_from_module` methods, along with safer inference methods to prevent infinite recursion and enhance value inference. These changes significantly improve the DFSA linter's accuracy and effectiveness when analyzing code containing API calls. * Enables cli cmd `databricks labs ucx create-catalog-schemas` to apply catalog/schema acl from legacy hive_metastore ([#2676](#2676)). The new release introduces a `databricks labs ucx create-catalog-schemas` command, which applies catalog/schema Access Control List (ACL) from a legacy hive_metastore. This command modifies the existing `table_mapping` method to include a new `grants_crawler` parameter in the `CatalogSchema` constructor, enabling the application of ACLs from the legacy hive_metastore. A corresponding unit test is included to ensure proper functionality. The `CatalogSchema` class in the `databricks.labs.ucx.hive_metastore.catalog_schema` module has been updated with a new argument `hive_acl` and the integration of the `GrantsCrawler` class. The `GrantsCrawler` class is responsible for crawling the Hive metastore and retrieving grants for catalogs, schemas, and tables. The `prepare_test` function has been updated to include the `hive_acl` argument and the `test_catalog_schema_acl` function has been updated to test the new functionality, ensuring that the correct grant statements are generated for a wider range of principals and catalogs/schemas. These changes improve the functionality and usability of the `databricks labs ucx create-catalog-schemas` command, allowing for a more seamless transition from a legacy hive metastore. * Fail `make test` on coverage below 90% ([#2682](#2682)). A new change has been introduced to the pyproject.toml file to enhance the codebase's quality and robustness by ensuring that the test coverage remains above 90%. This has been accomplished by adding the `--cov-fail-under=90` flag to the `test` and `coverage` scripts in the `[tool.hatch.envs.default.scripts]` section. This flag will cause the `make test` command to fail if the coverage percentage falls below the specified value of 90%, ensuring that all new changes are thoroughly tested and that the codebase maintains a minimum coverage threshold. This is a best practice for maintaining code coverage and improving the overall quality and reliability of the codebase. * Fixed DFSA false positives from f-string fragments ([#2679](#2679)). This commit addresses false positive DataFrame API Scanning Antipattern (DFSA) reports in Python code, specifically in f-string fragments containing forward slashes and curly braces. The linter has been updated to accurately detect DFSA paths while avoiding false positives, and it now checks for `JoinedStr` fragments in string constants. Additionally, the commit rectifies issues with duplicate advices reported by `SparkSqlPyLinter`. No new features or major functionality changes have been introduced; instead, the focus has been on improving the reliability and accuracy of DFSA detection. Co-authored by Eric Vergnaud, this commit includes new unit tests and refinements to the DFSA linter, specifically addressing false positive patterns like `f"/Repos/{thing1}/sdk-{thing2}-{thing3}"`. To review these changes, consult the updated tests in the `tests/unit/source_code/linters/test_directfs.py` file, such as the new test case for the f-string pattern causing false positives. By understanding these improvements, you'll ensure your project adheres to the latest updates, maintaining quality and accurate DFSA detection. * Fixed failing integration tests that perform a real assessment ([#2736](#2736)). In this release, we have made significant improvements to the integration tests in the `assessment` workflow, by reducing the scope of the assessment and improving efficiency and reliability. We have removed several object creation functions and added a new function `populate_for_linting` for linting purposes. The `populate_for_linting` function adds necessary information to the installation context, and is used to ensure that the integration tests still have the required data for linting. We have also added a pytest fixture `populate_for_linting` to set up a minimal amount of data in the workspace for linting purposes. These changes have been implemented in the `test_workflows.py` file in the integration/assessment directory. This will help to ensure that the tests are not unnecessarily extensive, and that they are able to accurately assess the functionality of the library. * Fixed sqlglot crasher with 'drop schema ...' statement ([#2758](#2758)). In this release, we have addressed a crash issue in the `sqlglot` library caused by the `drop schema` statement. A new method, `_unsafe_lint_expression`, has been introduced to prevent the crash by checking if the current expression is a `Use`, `Create`, or `Drop` statement and updating the `schema` attribute accordingly. The library now correctly handles the `drop schema` statement and returns a `Deprecation` warning if the table being processed is in the `hive_metastore` catalog and has been migrated to the Unity Catalog. Unit tests have been added to ensure the correct behavior of this code, and the linter for `from table` SQL has been updated to parse and handle the `drop schema` statement without raising any errors. These changes improve the library's overall reliability and stability, allowing it to operate smoothly with the `drop schema` statement. * Fixed test failure: `test_table_migration_job_refreshes_migration_status[regular-migrate-tables]` ([#2625](#2625)). In this release, we have addressed two issues ([#2621](#2621) and [#2537](#2537)) and fixed a test failure in `test_table_migration_job_refreshes_migration_status[regular-migrate-tables]`. The `index` and `index_full_refresh` methods in `table_migrate.py` have been updated to accept a new `force_refresh` flag. When set to `True`, these methods will ensure that the migration status is up-to-date. This change also affects the `ViewsMigrationSequencer` class, which now passes `force_refresh=True` to the `index` method. Additionally, we have fixed a test failure by reusing the `force_refresh` flag to ensure the migration status is up-to-date. The `TableMigrationStatus` class in `table_migration_status.py` has been modified to accept an optional `force_refresh` parameter in the `index` method, and a unit test has been updated to assert the correct behavior when updating the migration status. * Fixes error message ([#2759](#2759)). The `load` method of the `mapping.py` file in the `databricks/labs/ucx/hive_metastore` package has been updated to correct an error message displayed when a `NotFound` exception is raised. The previous message suggested running an incorrect command, which has been updated to the correct one: "Please run: databricks labs ucx create-table-mapping". This change does not add any new methods or alter existing functionality, but instead focuses on improving the user experience by providing accurate information when an error occurs. The scope of this change is limited to updating the error message, and no other modifications have been made. * Fixes issue of circular dependency of migrate-location ACL ([#2741](#2741)). In this release, we have resolved two issues ([#274](#274) * Fixes source table alias dissapearance during migrate_views ([#2726](#2726)). This release introduces a fix to preserve the alias for the source table during the conversion of CREATE VIEW SQL from the legacy Hive metastore to the Unity Catalog. The issue was addressed by adding a new test case, `test_migrate_view_alias_test`, to verify the correct handling of table aliases during migration. The changes also include a fix for the SQL conversion and new test cases to ensure the correct handling of table aliases, reflected in accurate SQL conversion. A new parameter, `alias`, has been added to the Table class, and the `apply` method in the `from_table.py` file has been updated. The migration process has been updated to retain the original alias of the table. Unit tests have been added and thoroughly tested to confirm the correctness of the changes, including handling potential intermittent failures caused by external dependencies. * Py4j table crawler: suggestions/fixes for describing tables ([#2684](#2684)). This release introduces significant improvements and fixes to the Py4J-based table crawler, enhancing its capability to describe tables effectively. The code for fetching table properties over the bridge has been updated, and error tracing has been improved through individual fetching of each table property and providing python backtrace on JVM side errors. Scala `Option` values unboxing issues have been resolved, and a small optimization has been implemented to detect partitioned tables without materializing the collection. The table's `.viewText()` property is now properly handled as a Scala `Option`. The `catalog` argument is now explicitly verified to be `hive_metastore`, and a new static method `_option_as_python` has been introduced for safely extracting values from Scala `Option`. The `_describe` method has been refactored to handle exceptions more gracefully and improved code readability. These changes result in better functionality, error handling, logging, and performance when describing tables within a specified catalog and database. The linked issues [#2658](#2658) and [#2579](#2579) are progressed through these updates, and appropriate testing has been conducted to ensure the improvements' effectiveness. * Speedup assessment workflow by making DBFS root table size calculation parallel ([#2745](#2745)). In this release, the assessment workflow for calculating DBFS root table size has been optimized through the parallelization of the calculation process, resulting in improved performance. This has been achieved by updating the `pipelines_crawler` function in `src/databricks/labs/ucx/contexts/workflow_task.py`, specifically the `cached_property table_size_crawler`, to include an additional argument `self.config.include_databases`. The `TablesCrawler` class has also been modified to include a generic type parameter `Table`, enabling type hinting and more robust type checking. Furthermore, the unit test file `test_table_size.py` in the `hive_metastore` directory has been updated to handle corrupt tables and invalid delta format errors more effectively. Additionally, a new entry `databricks-pydabs` has been added to the "known.json" file, potentially enabling better integration with the `databricks-pydabs` library or providing necessary configuration information for parallel processing. Overall, these changes improve the efficiency and scalability of the codebase and optimize the assessment workflow for calculating DBFS root table size. * Updated databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 ([#2747](#2747)). In this update, the requirement for `databricks-labs-blueprint` has been updated to version `>=0.8,<0.10` in the `pyproject.toml` file. This change allows the project to utilize the latest features and bug fixes included in version 0.9.0 of the `databricks-labs-blueprint` library. Notable updates in version 0.9.0 consist of the addition of Databricks CLI version as part of routed command telemetry and support for Unicode Byte Order Mark (BOM) in file upload and download operations. Additionally, various bug fixes and improvements have been implemented for the `WorkspacePath` class, including the addition of `stat()` methods and improved compatibility with different versions of Python. * Updated databricks-labs-lsql requirement from <0.12,>=0.5 to >=0.5,<0.13 ([#2688](#2688)). In this update, the version requirement of the `databricks-labs-lsql` library has been changed from a version greater than or equal to 0.5 and less than 0.12 to a version greater than or equal to 0.5 and less than 0.13. This allows the project to utilize the latest version of 'databricks-labs-lsql', which includes new methods for differentiating between a table that has never been written to and one with zero rows in the MockBackend class. Additionally, the update adds support for various filter types and improves testing coverage and reliability. The release notes and changelog for the updated library are provided in the commit message for reference. * Updated documentation to explain the usage of collections and eligible commands ([#2738](#2738)). The latest update to the Databricks Labs Unified CLI (UCX) tool introduces the `join-collection` command, which enables users to join two or more workspaces into a collection, allowing for streamlined and consolidated command execution across multiple workspaces. This feature is available to Account admins on the Databricks account, Workspace admins on the workspaces to be joined, and requires UCX installation on the workspace. To run collection-eligible commands, users can simply pass the `--run-as-collection=True` flag. This enhancement enhances the UCX tool's functionality, making it easier to manage and execute commands on multiple workspaces. * Updated sqlglot requirement from <25.22,>=25.5.0 to >=25.5.0,<25.23 ([#2687](#2687)). In this pull request, we have updated the version requirement for the `sqlglot` library in the pyproject.toml file. The previous requirement specified a version greater than or equal to 25.5.0 and less than 25.22, but we have updated it to allow for versions greater than or equal to 25.5.0 and less than 25.23. This change allows us to use the latest version of 'sqlglot', while still ensuring compatibility with other dependencies. Additionally, this pull request includes a detailed changelog from the `sqlglot` repository, which provides information on the features, bug fixes, and changes included in each version. This can help us understand the scope of the update and how it may impact our project. * [DOCUMENTATION] Improve documentation on using account profile for `sync-workspace-info` cli command ([#2683](#2683)). The `sync-workspace-info` CLI command has been added to the Databricks Labs UCX package, which uploads the workspace configuration to all workspaces in the Databricks account where the `ucx` tool is installed. This feature requires Databricks Account Administrator privileges and is necessary to create an immutable default catalog mapping for the table migration process. It also serves as a prerequisite for the `create-table-mapping` command. To utilize this command, users must configure the Databricks CLI profile with access to the Databricks account console, available at "accounts.cloud.databricks.com" or "accounts.azuredatabricks.net". Additionally, the documentation for using the account profile with the `sync-workspace-info` command has been enhanced, addressing issue [#1762](#1762). * [DOCUMENTATION] Improve documentation when installing UCX from a machine with restricted internet access ([#2690](#2690)). "A new section has been added to the `ADVANCED` installation section of the UCX library documentation, providing detailed instructions for installing UCX with a company-hosted PyPI mirror. This feature is intended for environments with restricted internet access, allowing users to bypass the public PyPI index and use a company-controlled mirror instead. Users will need to add all UCX dependencies to the company-hosted PyPI mirror and set the `PIP_INDEX_URL` environment variable to the mirror URL during installation. The solution also includes a prompt asking the user if their workspace blocks internet access. Additionally, the documentation has been updated to clarify that UCX requires internet access to connect to GitHub for downloading the tool, specifying the necessary URLs that need to be accessible. This update aims to improve the installation process for users with restricted internet access and provide clear instructions and prompts for installing UCX on machines with limited internet connectivity." Dependency updates: * Updated sqlglot requirement from <25.22,>=25.5.0 to >=25.5.0,<25.23 ([#2687](#2687)). * Updated databricks-labs-lsql requirement from <0.12,>=0.5 to >=0.5,<0.13 ([#2688](#2688)). * Updated databricks-labs-blueprint requirement from <0.9,>=0.8 to >=0.8,<0.10 ([#2747](#2747)).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
dependencies
Pull requests that update a dependency file
python
Pull requests that update Python code
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rebasing might not happen immediately, so don't worry if this takes some time.
Note: if you make any changes to this PR yourself, they will take precedence over the rebase.
Updates the requirements on databricks-labs-blueprint to permit the latest version.
Changelog
Sourced from databricks-labs-blueprint's changelog.
... (truncated)
Commits
c3b53a4
Release v0.9.0 (#148)98c5f30
Added Databricks CLI version as part of routed command telemetry (#147)2bfbf18
Release v0.8.3 (#145)36fc873
add missing stat() methods to DBFSPath and WorkspacePath (#144)c531c3f
Release v0.8.2 (#139)53b9463
support files with unicode BOM (#138)ec82326
Make hatch a prerequisite (#137)98e75bc
Release v0.8.1 (#136)821bc0a
Fixed py3.10 compatibility for_parts
in pathlike (#135)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditions
will show all of the ignore conditions of the specified dependency@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)