-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool is failing creating new groups #353
Labels
migrate/groups
Corresponds to Migrate Groups Step of go/uc/upgrade
Comments
mwojtyczka
changed the title
Creating groups is not retried.
Tool is failing creating new groups
Oct 2, 2023
nfx
added
migrate/groups
Corresponds to Migrate Groups Step of go/uc/upgrade
and removed
step/assessment
go/uc/upgrade - Assessment Step
labels
Oct 2, 2023
we're blocked on databricks/databricks-sdk-py#337, which is adding retries on HTTP 504 (Gateway Timeout) |
github-project-automation
bot
moved this to Todo
in UCX (weekly) - DO NOT USE THIS BOARD
Oct 3, 2023
github-project-automation
bot
moved this from Todo
to Done
in UCX (weekly) - DO NOT USE THIS BOARD
Oct 3, 2023
github-merge-queue bot
pushed a commit
to databricks/databricks-sdk-py
that referenced
this issue
Nov 13, 2023
… `BadRequest`, `PermissionDenied`, `InternalError`, and others (#376) Improve the ergonomics of SDK, where instead of `except DatabricksError as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff()` we could do `except NotFound: do_stuff()`, like in [this example](https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/workspace_access/generic.py#L71-L84). Additionally, it'll make it easier to read stack traces, as they will contain specific exception class name. Examples of unclear stack traces are: databrickslabs/ucx#359, databrickslabs/ucx#353, databrickslabs/ucx#347, # First principles - ~~do not override `builtins.NotImplemented` for `NOT_IMPLEMENTED` error code~~ - assume that platform error_code/HTTP status code mapping is not perfect and in the state of transition - we do represent reasonable subset of error codes as specific exceptions - it's still possible to access `error_code` from specific exceptions like `NotFound` or `AlreadyExists`. # Proposal - have hierarchical exceptions, also inheriting from Python's built-in exceptions - more specific error codes override more generic HTTP status codes - more generic HTTP status codes matched after more specific error codes, where there's a default exception class per HTTP status code, and we do rely on Databricks platform exception mapper to do the right thing. - have backward-compatible error creation for cases like using older versions of the SDK on the way never releases of the platform. ![image](https://github.com/databricks/databricks-sdk-py/assets/259697/a4519f76-0778-468c-9bf5-6133984b5af7) ### Naming conflict resolution We have four sources of naming and this is a proposed order of naming conflict resolution: 1. Databricks `error_code`, that is surfaced in our API documentation, known by Databricks users 2. HTTP Status codes, known by some developers 3. Python builtin exceptions, known by some developers 4. grpc error codes https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto#L38-L185, know by some developers --------- Co-authored-by: Miles Yucht <miles@databricks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The tool is failing sometimes with the creation of groups.
`DatabricksError: The service at /api/2.0/preview/scim/v2/Groups is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-8bbf887439cb887433f92a1197f8bf36-4d79ab1b4df6ed88-00]
DatabricksError Traceback (most recent call last)
File ~/.ipykernel/1041/command--1-1857924431:18
15 entry = [ep for ep in metadata.distribution("databricks_labs_ucx").entry_points if ep.name == "runtime"]
16 if entry:
17 # Load and execute the entrypoint, assumes no parameters
---> 18 entry[0].load()()
19 else:
20 import databricks_labs_ucx
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:213, in main()
212 def main():
--> 213 trigger(*sys.argv)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/tasks.py:93, in trigger(*argv)
90 cfg = WorkspaceConfig.from_file(Path(args["config"]))
91 logging.getLogger("databricks").setLevel(cfg.log_level)
---> 93 current_task.fn(cfg)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/runtime.py:192, in migrate_permissions(cfg)
168 """As we embark on the complex journey of migrating from Hive Metastore to the Databricks Unity Catalog,
169 a crucial phase in this transition involves the careful management of permissions.
170 This intricate process entails several key steps: first, applying permissions to designated backup groups;
(...)
189
190 See interactive tutorial here."""
191 toolkit = GroupMigrationToolkit(cfg)
--> 192 toolkit.prepare_environment()
193 toolkit.apply_permissions_to_backup_groups()
194 toolkit.replace_workspace_groups_with_account_groups()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/migration.py:118, in GroupMigrationToolkit.prepare_environment(self)
117 def prepare_environment(self):
--> 118 self._group_manager.prepare_groups_in_environment()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:181, in GroupManager.prepare_groups_in_environment(self)
178 ac_group_names = {_.display_name for _ in self._account_groups}
179 group_names = list(ws_group_names.intersection(ac_group_names))
--> 181 self._set_migration_groups(group_names)
182 logger.info("Environment prepared successfully")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:123, in GroupManager._set_migration_groups(self, groups_names)
120 backup_group = self._get_or_create_backup_group(source_group_name=name, source_group=ws_group)
121 return MigrationGroupInfo(workspace=ws_group, backup=backup_group, account=acc_group)
--> 123 collected_groups = ThreadedExecution.gather(
124 "get group info", [partial(get_group_info, group_name) for group_name in groups_names]
125 )
126 for g in collected_groups:
127 self._migration_state.add(g)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:48, in ThreadedExecution.gather(cls, name, tasks)
45 @classmethod
46 def gather(cls, name: str, tasks: list[ExecutableFunction]) -> list[ExecutableResult]:
47 reporter = ProgressReporter(len(tasks), f"{name}: ")
---> 48 return cls(tasks, num_threads=4, progress_reporter=reporter).run()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in ThreadedExecution.run(self)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py:63, in (.0)
60 results = concurrent.futures.wait(self._futures, return_when=ALL_COMPLETED)
62 logger.debug("Collecting the results from threaded execution")
---> 63 collected = [future.result() for future in results.done]
64 return collected
File /usr/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
453 self._condition.wait(timeout)
455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
File /usr/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None
File /usr/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
57 try:
---> 58 result = self.fn(*self.args, **self.kwargs)
59 except BaseException as exc:
60 self.future.set_exception(exc)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:120, in GroupManager._set_migration_groups..get_group_info(name)
118 acc_group = self._get_group(name, "account")
119 assert acc_group, f"Group {name} not found on the account level"
--> 120 backup_group = self._get_or_create_backup_group(source_group_name=name, source_group=ws_group)
121 return MigrationGroupInfo(workspace=ws_group, backup=backup_group, account=acc_group)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/groups.py:102, in GroupManager._get_or_create_backup_group(self, source_group_name, source_group)
99 return backup_group
101 logger.info(f"Creating backup group {backup_group_name}")
--> 102 backup_group = self._ws.groups.create(
103 display_name=backup_group_name,
104 meta=source_group.meta,
105 entitlements=source_group.entitlements,
106 roles=source_group.roles,
107 members=source_group.members,
108 )
109 self._workspace_groups.append(backup_group)
110 logger.info(f"Backup group {backup_group_name} successfully created")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/service/iam.py:1713, in GroupsAPI.create(self, display_name, entitlements, external_id, groups, id, members, meta, roles)
1711 if roles is not None: body['roles'] = [v.as_dict() for v in roles]
1712 headers = {'Accept': 'application/json', 'Content-Type': 'application/json', }
-> 1713 res = self._api.do('POST', '/api/2.0/preview/scim/v2/Groups', body=body, headers=headers)
1714 return Group.from_dict(res)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py:1084, in ApiClient.do(self, method, path, query, headers, body, raw, files, data)
1080 if not response.ok:
1081 # TODO: experiment with traceback pruning for better readability
1082 # See https://stackoverflow.com/a/58821552/277035
1083 payload = response.json()
-> 1084 raise self._make_nicer_error(status_code=response.status_code, **payload) from None
1085 if raw:
1086 return StreamingResponse(response)
DatabricksError: The service at /api/2.0/preview/scim/v2/Groups is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-8bbf887439cb887433f92a1197f8bf36-4d79ab1b4df6ed88-00]`
The text was updated successfully, but these errors were encountered: