Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool is failing creating new groups #353

Closed
mwojtyczka opened this issue Oct 2, 2023 · 1 comment · Fixed by #362 or #375
Closed

Tool is failing creating new groups #353

mwojtyczka opened this issue Oct 2, 2023 · 1 comment · Fixed by #362 or #375
Assignees
Labels
migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade

Comments

@mwojtyczka
Copy link
Contributor

The tool is failing sometimes with the creation of groups.

`DatabricksError: The service at /api/2.0/preview/scim/v2/Groups is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-8bbf887439cb887433f92a1197f8bf36-4d79ab1b4df6ed88-00]

DatabricksError Traceback (most recent call last)
File ~/.ipykernel/1041/command--1-1857924431:18
15 entry = [ep for ep in metadata.distribution("databricks_labs_ucx").entry_points if ep.name == "runtime"]
16 if entry:
17 # Load and execute the entrypoint, assumes no parameters
---> 18 entry[0].load()()
19 else:
20 import databricks_labs_ucx

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:213, in main()
212 def main():
--> 213 trigger(*sys.argv)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/tasks.py:93, in trigger(*argv)
90 cfg = WorkspaceConfig.from_file(Path(args["config"]))
91 logging.getLogger("databricks").setLevel(cfg.log_level)
---> 93 current_task.fn(cfg)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:192, in migrate_permissions(cfg)
168 """As we embark on the complex journey of migrating from Hive Metastore to the Databricks Unity Catalog,
169 a crucial phase in this transition involves the careful management of permissions.
170 This intricate process entails several key steps: first, applying permissions to designated backup groups;
(...)
189
190 See interactive tutorial here."""
191 toolkit = GroupMigrationToolkit(cfg)
--> 192 toolkit.prepare_environment()
193 toolkit.apply_permissions_to_backup_groups()
194 toolkit.replace_workspace_groups_with_account_groups()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/migration.py:118, in GroupMigrationToolkit.prepare_environment(self)
117 def prepare_environment(self):
--> 118 self._group_manager.prepare_groups_in_environment()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:181, in GroupManager.prepare_groups_in_environment(self)
178 ac_group_names = {_.display_name for _ in self._account_groups}
179 group_names = list(ws_group_names.intersection(ac_group_names))
--> 181 self._set_migration_groups(group_names)
182 logger.info("Environment prepared successfully")

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:123, in GroupManager._set_migration_groups(self, groups_names)
120 backup_group = self._get_or_create_backup_group(source_group_name=name, source_group=ws_group)
121 return MigrationGroupInfo(workspace=ws_group, backup=backup_group, account=acc_group)
--> 123 collected_groups = ThreadedExecution.gather(
124 "get group info", [partial(get_group_info, group_name) for group_name in groups_names]
125 )
126 for g in collected_groups:
127 self._migration_state.add(g)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:48, in ThreadedExecution.gather(cls, name, tasks)
45 @classmethod
46 def gather(cls, name: str, tasks: list[ExecutableFunction]) -> list[ExecutableResult]:
47 reporter = ProgressReporter(len(tasks), f"{name}: ")
---> 48 return cls(tasks, num_threads=4, progress_reporter=reporter).run()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in ThreadedExecution.run(self)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in (.0)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected

File /usr/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
453 self._condition.wait(timeout)
455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File /usr/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None

File /usr/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
57 try:
---> 58 result = self.fn(*self.args, **self.kwargs)
59 except BaseException as exc:
60 self.future.set_exception(exc)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:120, in GroupManager._set_migration_groups..get_group_info(name)
118 acc_group = self._get_group(name, "account")
119 assert acc_group, f"Group {name} not found on the account level"
--> 120 backup_group = self._get_or_create_backup_group(source_group_name=name, source_group=ws_group)
121 return MigrationGroupInfo(workspace=ws_group, backup=backup_group, account=acc_group)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:102, in GroupManager._get_or_create_backup_group(self, source_group_name, source_group)
99 return backup_group
101 logger.info(f"Creating backup group {backup_group_name}")
--> 102 backup_group = self._ws.groups.create(
103 display_name=backup_group_name,
104 meta=source_group.meta,
105 entitlements=source_group.entitlements,
106 roles=source_group.roles,
107 members=source_group.members,
108 )
109 self._workspace_groups.append(backup_group)
110 logger.info(f"Backup group {backup_group_name} successfully created")

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/service/iam.py:1713, in GroupsAPI.create(self, display_name, entitlements, external_id, groups, id, members, meta, roles)
1711 if roles is not None: body['roles'] = [v.as_dict() for v in roles]
1712 headers = {'Accept': 'application/json', 'Content-Type': 'application/json', }
-> 1713 res = self._api.do('POST', '/api/2.0/preview/scim/v2/Groups', body=body, headers=headers)
1714 return Group.from_dict(res)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py:1084, in ApiClient.do(self, method, path, query, headers, body, raw, files, data)
1080 if not response.ok:
1081 # TODO: experiment with traceback pruning for better readability
1082 # See https://stackoverflow.com/a/58821552/277035
1083 payload = response.json()
-> 1084 raise self._make_nicer_error(status_code=response.status_code, **payload) from None
1085 if raw:
1086 return StreamingResponse(response)

DatabricksError: The service at /api/2.0/preview/scim/v2/Groups is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-8bbf887439cb887433f92a1197f8bf36-4d79ab1b4df6ed88-00]`

@mwojtyczka mwojtyczka changed the title Creating groups is not retried. Tool is failing creating new groups Oct 2, 2023
@william-conti william-conti added bug step/assessment go/uc/upgrade - Assessment Step labels Oct 2, 2023
@nfx nfx added migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade and removed step/assessment go/uc/upgrade - Assessment Step labels Oct 2, 2023
@nfx
Copy link
Collaborator

nfx commented Oct 2, 2023

we're blocked on databricks/databricks-sdk-py#337, which is adding retries on HTTP 504 (Gateway Timeout)

@nfx nfx added this to the 1 week milestone Oct 2, 2023
@nfx nfx self-assigned this Oct 2, 2023
@nfx nfx linked a pull request Oct 3, 2023 that will close this issue
@nfx nfx closed this as completed Oct 3, 2023
github-merge-queue bot pushed a commit that referenced this issue Oct 4, 2023
…n errors (#375)

Fixed deletion of backup groups [issue #374].
Added rate limits and retries to group operations [issue #353].
Temp fix for issue #359
Added log messages for better visibility.
Added useful troubleshooting snippets to the docs.
zpappa pushed a commit that referenced this issue Oct 6, 2023
…n errors (#375)

Fixed deletion of backup groups [issue #374].
Added rate limits and retries to group operations [issue #353].
Temp fix for issue #359
Added log messages for better visibility.
Added useful troubleshooting snippets to the docs.
github-merge-queue bot pushed a commit to databricks/databricks-sdk-py that referenced this issue Nov 13, 2023
… `BadRequest`, `PermissionDenied`, `InternalError`, and others (#376)

Improve the ergonomics of SDK, where instead of `except DatabricksError
as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff()` we
could do `except NotFound: do_stuff()`, like in [this
example](https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/workspace_access/generic.py#L71-L84).

Additionally, it'll make it easier to read stack traces, as they will
contain specific exception class name. Examples of unclear stack traces
are: databrickslabs/ucx#359,
databrickslabs/ucx#353,
databrickslabs/ucx#347,

# First principles
- ~~do not override `builtins.NotImplemented` for `NOT_IMPLEMENTED`
error code~~
- assume that platform error_code/HTTP status code mapping is not
perfect and in the state of transition
- we do represent reasonable subset of error codes as specific
exceptions
- it's still possible to access `error_code` from specific exceptions
like `NotFound` or `AlreadyExists`.

# Proposal
- have hierarchical exceptions, also inheriting from Python's built-in
exceptions
- more specific error codes override more generic HTTP status codes
- more generic HTTP status codes matched after more specific error
codes, where there's a default exception class per HTTP status code, and
we do rely on Databricks platform exception mapper to do the right
thing.
- have backward-compatible error creation for cases like using older
versions of the SDK on the way never releases of the platform.


![image](https://github.com/databricks/databricks-sdk-py/assets/259697/a4519f76-0778-468c-9bf5-6133984b5af7)

### Naming conflict resolution

We have four sources of naming and this is a proposed order of naming
conflict resolution:
1. Databricks `error_code`, that is surfaced in our API documentation,
known by Databricks users
2. HTTP Status codes, known by some developers
3. Python builtin exceptions, known by some developers
4. grpc error codes
https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto#L38-L185,
know by some developers

---------

Co-authored-by: Miles Yucht <miles@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade
Projects
3 participants