Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: OSError: [Errno 95] Operation not supported: /Workspace/Users/.../path/to/notebook #2888

Closed
1 task done
nfx opened this issue Oct 9, 2024 · 5 comments · Fixed by #2923 or #2924
Closed
1 task done
Assignees
Labels
bug Something isn't working migrate/code Abstract Syntax Trees and other dark magic step/assessment go/uc/upgrade - Assessment Step

Comments

@nfx
Copy link
Collaborator

nfx commented Oct 9, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

apparently, pathlib.Path is used in place of WorkspacePath and we try to read notebook as a file, but WSFS doesn't support it.

13:03:42 ERROR [d.l.blueprint.parallel][linting_workflows_5] linting workflows(657323031729870) task failed: [Errno 95] Operation not supported: '/Workspace/Users/........./jobs/notebooks/landing/fetch_data': Traceback (most recent call last):
  File ".../site-packages/databricks/labs/blueprint/parallel.py", line 158, in inner
    return func(*args, **kwargs), None
           ^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/jobs.py", line 399, in lint_job
    problems, dfsas, tables = self._lint_job(job)
                              ^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/jobs.py", line 418, in _lint_job
    graph, advices, session_state = self._build_task_dependency_graph(task, job)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/jobs.py", line 467, in _build_task_dependency_graph
    problems = container.build_dependency_graph(graph)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/jobs.py", line 133, in build_dependency_graph
    return list(self._register_task_dependencies(parent))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/jobs.py", line 139, in _register_task_dependencies
    yield from self._register_notebook(graph)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/jobs.py", line 221, in _register_notebook
    return graph.register_notebook(path, False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/graph.py", line 58, in register_notebook
    maybe_graph = self.register_dependency(maybe.dependency)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/graph.py", line 89, in register_dependency
    container = dependency.load(self.path_lookup)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/graph.py", line 327, in load
    return self._loader.load_dependency(path_lookup, self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/databricks/labs/ucx/source_code/notebooks/loaders.py", line 60, in load_dependency
    content = absolute_path.read_text("utf-8")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/pathlib.py", line 1059, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/pathlib.py", line 1045, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 95] Operation not supported: '/Workspace/Users/........./jobs/notebooks/landing/fetch_data'
@nfx nfx added bug Something isn't working step/assessment go/uc/upgrade - Assessment Step migrate/code Abstract Syntax Trees and other dark magic labels Oct 9, 2024
@asnare
Copy link
Contributor

asnare commented Oct 10, 2024

Is there a backtrace, logs, or some reference for this?

@asnare asnare self-assigned this Oct 10, 2024
@asnare
Copy link
Contributor

asnare commented Oct 10, 2024

This looks suspicious and will indeed convert WorkspacePath and DBFSPath paths into local paths, triggering (I believe?) this error.

@nfx
Copy link
Collaborator Author

nfx commented Oct 10, 2024

@asnare if a file has no extension and starts with /Workspace, then it needs to be converted into a WorkspacePath (stripping the /Workspace prefix).

we also have another issue there - paths like /foo/bar/baz/../../x that have to convert to /foo/x, because workspace API doesn't understand relative paths.

@asnare
Copy link
Contributor

asnare commented Oct 10, 2024

Here is another place where it could be going wrong.

@nfx
Copy link
Collaborator Author

nfx commented Oct 10, 2024

@asnare got you the stack trace

@nfx nfx assigned nfx and unassigned asnare Oct 10, 2024
nfx added a commit that referenced this issue Oct 10, 2024
…g GIT-sourced workflows from static code analysis

Fix #2888
@nfx nfx closed this as completed in #2924 Oct 10, 2024
@nfx nfx closed this as completed in 752cbec Oct 10, 2024
nfx pushed a commit that referenced this issue Oct 11, 2024
## Changes

This PR fixes one particular situation where a `WorkspacePath` instance
can be converted into a generic `Path` instance, leading to downstream
errors.

### Linked issues

Potentially fixes #2888.
_Note: I haven't been able to reproduce the issue myself!_

### Tests

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)
nfx added a commit that referenced this issue Oct 11, 2024
* Added `imageio` to known list ([#2942](#2942)). In this release, we have added `imageio` to our library's known list, which includes all its modules, sub-modules, testing, and typing packages. This change addresses issue [#1931](#1931), which may have been caused by a dependency or compatibility issue. The `imageio` library offers I/O functionality for scientific imaging data, and its addition is expected to expand the library's supported formats and functionality. As a result, software engineers can leverage the enhanced capabilities to handle scientific imaging data more effectively.
* Added `ipyflow-core` to known list ([#2945](#2945)). In this release, the project has expanded its capabilities by adding two open-source libraries to a known list contained in a JSON file. The first library, `ipyflow-core`, brings a range of modules for the data model, experimental features, frontend, kernel, patches, shell, slicing, tracing, types, and utils. The second library, `pyccolo`, offers fast and adaptable code transformation using abstract syntax trees, with functionalities including code rewriting, import hooks, syntax augmentation, and tracing, along with various utility functions. By incorporating these libraries into the project, we aim to enhance its overall efficiency and versatility, providing software engineers with access to a broader set of tools and capabilities.
* Added `isodate` to known list ([#2946](#2946)). In this release, we have added the `isodate` package to our library's known package list, which resolves part of issue [#1931](#1931). The `isodate` package provides several modules for parsing and manipulating ISO 8601 dated strings, including `isodate`, `isodate.duration`, `isodate.isodates`, `isodate.isodatetime`, `isodate.isoduration`, `isodate.isoerror`, `isodate.isostrf`, `isodate.isotime`, `isodate.isotzinfo`, and `isodate.tzinfo`. This addition enhances our compatibility and integration with the `isodate` package in the larger system, enabling users to utilize the full functionality of the `isodate` package in their applications.
* Experimental command for enabling HMS federation ([#2939](#2939)). In this release, we have introduced an experimental feature for enabling HMS (Hive Metastore) federation through a new `enable-hms-federation` command in the labs.yml file. This command, when enabled, will create a federated HMS catalog synced with the workspace HMS in a hierarchical manner, facilitating migration and integration of HMS models. Additionally, we have added an optional `enable_hms_federation` constructor argument to the `Locations` class in the locations.py file. Setting this flag to True enables a fallback mode for AWS resources to use HMS for data access. The `HiveMetastoreFederationEnabler` class is introduced with an `enable()` method to modify the workspace configuration and enable HMS federation. These changes aim to provide a more streamlined experience for users working with complex modeling systems, and careful testing and feedback are encouraged on this experimental feature.
* Experimental support for HMS federation ([#2283](#2283)). In this release, we introduce experimental support for Hive Metastore (HMS) federation in our open-source library. A new `HiveMetastoreFederation` class has been implemented, enabling the registration of an internal HMS as a federated catalog. This class utilizes the `WorkspaceClient` object from the `databricks.sdk` library to create necessary connections and handles permissions for successful federation. Additionally, a new file `test_federation.py` has been added, containing unit tests to demonstrate the functionality of HMS federation, including the creation of federated catalogs and handling of existing connections. As this is an experimental feature, users should expect potential issues and are encouraged to provide feedback to help improve its functionality.
* Fixed `InvalidParameterValue` failure for scanning jobs running on interactive clusters that got deleted ([#2935](#2935)). In this release, we have addressed an issue where an `InvalidParameterValue` error was not being handled properly during scanning jobs run on interactive clusters that were deleted. This error has now been added to the exceptions handled in the `_register_existing_cluster_id` and `_register_cluster_info` methods. These methods retrieve information about an existing cluster or its ID, and if the cluster is not found or an invalid parameter value is provided, they now yield a `DependencyProblem` object with an appropriate error message. This `DependencyProblem` object is used to indicate that there is a problem with the dependencies required for the job, preventing it from running successfully. By handling this error, the code ensures that the job can fail gracefully and provide a clear and informative error message to the user, avoiding any potential confusion or unexpected behavior.
* Improve logging when skipping legacy grant in `create-catalogs-schemas` ([#2933](#2933)). In this update, the `create-catalogs-schemas` process has been improved with enhanced logging for skipped legacy grants. This change is a follow-up to previous issue [#2917](#2917) and progresses issue [#2932](#2932). The `_apply_from_legacy_table_acls` and `_update_principal_acl` methods now include more descriptive logging when a legacy grant is skipped, providing information about the type of grant being skipped and clarifying that it is not supported in the Unity Catalog. Additionally, a new method `get_interactive_cluster_grants` has been added to the `principal_acl` object, returning a list of grants specific to the interactive cluster. The `hive_acl` object is now autospec'd after the `principal_acl.get_interactive_cluster_grants` call. The `test_catalog_schema_acl` function has been updated to reflect these changes. New grants have been added to the `hive_grants` list, including grants for `user1` with `USE` action type on `hive_metastore` catalog and grants for `user2` with `USAGE` action type on `schema3` database. A new grant for `user4` with `DENY` action type on `schema3` database has also been added, but it is skipped in the logging due to it not being supported in UC. Skipped legacy grants for `DENY` action type on `catalog2` catalog and 'catalog2.schema2' database are also included in the commit. These updates improve the clarity and usefulness of the logs, making it easier for users to understand what is happening during the migration of grants to UC and ensuring that unsupported grants are not inadvertently included in the UC.
* Notebook linting: ensure path-type is preserved during linting ([#2923](#2923)). In this release, we have enhanced the type safety of the `NotebookResolver` class in the `loaders.py` module by introducing a new type variable `PathT`. This change includes an update to the `_adjust_path` method, which ensures the preservation of the original file suffix when adding the ".py" suffix for Python notebooks. This addresses a potential issue where a `WorkspacePath` instance could be incorrectly converted to a generic `Path` instance, causing downstream errors. Although this change may potentially resolve issue [#2888](#2888), the reproduction steps for that issue were not provided in the commit message. It is important to note that while this change has been manually tested, it does not include any new unit tests, integration tests, or staging environment verification.
@nfx nfx mentioned this issue Oct 11, 2024
nfx added a commit that referenced this issue Oct 11, 2024
* Added `imageio` to known list
([#2942](#2942)). In this
release, we have added `imageio` to our library's known list, which
includes all its modules, sub-modules, testing, and typing packages.
This change addresses issue
[#1931](#1931), which may
have been caused by a dependency or compatibility issue. The `imageio`
library offers I/O functionality for scientific imaging data, and its
addition is expected to expand the library's supported formats and
functionality. As a result, software engineers can leverage the enhanced
capabilities to handle scientific imaging data more effectively.
* Added `ipyflow-core` to known list
([#2945](#2945)). In this
release, the project has expanded its capabilities by adding two
open-source libraries to a known list contained in a JSON file. The
first library, `ipyflow-core`, brings a range of modules for the data
model, experimental features, frontend, kernel, patches, shell, slicing,
tracing, types, and utils. The second library, `pyccolo`, offers fast
and adaptable code transformation using abstract syntax trees, with
functionalities including code rewriting, import hooks, syntax
augmentation, and tracing, along with various utility functions. By
incorporating these libraries into the project, we aim to enhance its
overall efficiency and versatility, providing software engineers with
access to a broader set of tools and capabilities.
* Added `isodate` to known list
([#2946](#2946)). In this
release, we have added the `isodate` package to our library's known
package list, which resolves part of issue
[#1931](#1931). The
`isodate` package provides several modules for parsing and manipulating
ISO 8601 dated strings, including `isodate`, `isodate.duration`,
`isodate.isodates`, `isodate.isodatetime`, `isodate.isoduration`,
`isodate.isoerror`, `isodate.isostrf`, `isodate.isotime`,
`isodate.isotzinfo`, and `isodate.tzinfo`. This addition enhances our
compatibility and integration with the `isodate` package in the larger
system, enabling users to utilize the full functionality of the
`isodate` package in their applications.
* Experimental command for enabling HMS federation
([#2939](#2939)). In this
release, we have introduced an experimental feature for enabling HMS
(Hive Metastore) federation through a new `enable-hms-federation`
command in the labs.yml file. This command, when enabled, will create a
federated HMS catalog synced with the workspace HMS in a hierarchical
manner, facilitating migration and integration of HMS models.
Additionally, we have added an optional `enable_hms_federation`
constructor argument to the `Locations` class in the locations.py file.
Setting this flag to True enables a fallback mode for AWS resources to
use HMS for data access. The `HiveMetastoreFederationEnabler` class is
introduced with an `enable()` method to modify the workspace
configuration and enable HMS federation. These changes aim to provide a
more streamlined experience for users working with complex modeling
systems, and careful testing and feedback are encouraged on this
experimental feature.
* Experimental support for HMS federation
([#2283](#2283)). In this
release, we introduce experimental support for Hive Metastore (HMS)
federation in our open-source library. A new `HiveMetastoreFederation`
class has been implemented, enabling the registration of an internal HMS
as a federated catalog. This class utilizes the `WorkspaceClient` object
from the `databricks.sdk` library to create necessary connections and
handles permissions for successful federation. Additionally, a new file
`test_federation.py` has been added, containing unit tests to
demonstrate the functionality of HMS federation, including the creation
of federated catalogs and handling of existing connections. As this is
an experimental feature, users should expect potential issues and are
encouraged to provide feedback to help improve its functionality.
* Fixed `InvalidParameterValue` failure for scanning jobs running on
interactive clusters that got deleted
([#2935](#2935)). In this
release, we have addressed an issue where an `InvalidParameterValue`
error was not being handled properly during scanning jobs run on
interactive clusters that were deleted. This error has now been added to
the exceptions handled in the `_register_existing_cluster_id` and
`_register_cluster_info` methods. These methods retrieve information
about an existing cluster or its ID, and if the cluster is not found or
an invalid parameter value is provided, they now yield a
`DependencyProblem` object with an appropriate error message. This
`DependencyProblem` object is used to indicate that there is a problem
with the dependencies required for the job, preventing it from running
successfully. By handling this error, the code ensures that the job can
fail gracefully and provide a clear and informative error message to the
user, avoiding any potential confusion or unexpected behavior.
* Improve logging when skipping legacy grant in
`create-catalogs-schemas`
([#2933](#2933)). In this
update, the `create-catalogs-schemas` process has been improved with
enhanced logging for skipped legacy grants. This change is a follow-up
to previous issue
[#2917](#2917) and
progresses issue
[#2932](#2932). The
`_apply_from_legacy_table_acls` and `_update_principal_acl` methods now
include more descriptive logging when a legacy grant is skipped,
providing information about the type of grant being skipped and
clarifying that it is not supported in the Unity Catalog. Additionally,
a new method `get_interactive_cluster_grants` has been added to the
`principal_acl` object, returning a list of grants specific to the
interactive cluster. The `hive_acl` object is now autospec'd after the
`principal_acl.get_interactive_cluster_grants` call. The
`test_catalog_schema_acl` function has been updated to reflect these
changes. New grants have been added to the `hive_grants` list, including
grants for `user1` with `USE` action type on `hive_metastore` catalog
and grants for `user2` with `USAGE` action type on `schema3` database. A
new grant for `user4` with `DENY` action type on `schema3` database has
also been added, but it is skipped in the logging due to it not being
supported in UC. Skipped legacy grants for `DENY` action type on
`catalog2` catalog and 'catalog2.schema2' database are also included in
the commit. These updates improve the clarity and usefulness of the
logs, making it easier for users to understand what is happening during
the migration of grants to UC and ensuring that unsupported grants are
not inadvertently included in the UC.
* Notebook linting: ensure path-type is preserved during linting
([#2923](#2923)). In this
release, we have enhanced the type safety of the `NotebookResolver`
class in the `loaders.py` module by introducing a new type variable
`PathT`. This change includes an update to the `_adjust_path` method,
which ensures the preservation of the original file suffix when adding
the ".py" suffix for Python notebooks. This addresses a potential issue
where a `WorkspacePath` instance could be incorrectly converted to a
generic `Path` instance, causing downstream errors. Although this change
may potentially resolve issue
[#2888](#2888), the
reproduction steps for that issue were not provided in the commit
message. It is important to note that while this change has been
manually tested, it does not include any new unit tests, integration
tests, or staging environment verification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working migrate/code Abstract Syntax Trees and other dark magic step/assessment go/uc/upgrade - Assessment Step
Projects
2 participants