Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawl dashboards, queries, and alerts #144

Merged
merged 80 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
af09334
init
renardeinside Sep 3, 2023
5efc5b7
add basic impl
renardeinside Sep 3, 2023
52de7aa
add initial code for sql applicator
renardeinside Sep 3, 2023
8b351c0
Update src/databricks/labs/ucx/inventory/inventorizer.py
renardeinside Sep 4, 2023
6244fe2
factor-out the listing functions
renardeinside Sep 4, 2023
7d91f88
remove old usages of listing
renardeinside Sep 4, 2023
b218fe1
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 4, 2023
9a82ce2
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 5, 2023
e64e80b
factor-out various inventorizers into separate methods
renardeinside Sep 5, 2023
b8f5959
apply linting
renardeinside Sep 5, 2023
15f7128
add class-based SQL inventorizer
renardeinside Sep 5, 2023
dcc6707
fix generics passing to standard inventorizer
renardeinside Sep 5, 2023
2f7565e
apply formatting
renardeinside Sep 5, 2023
cce9e8e
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 5, 2023
627df05
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 6, 2023
7226eea
refactor relevance identification logic
renardeinside Sep 6, 2023
88f92c3
refactor applicator logic and add sql applicator
renardeinside Sep 6, 2023
6f00c8c
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 6, 2023
a9e6c7c
fix imports
renardeinside Sep 6, 2023
29463ec
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 7, 2023
88b1b8a
introduce func-based applicators
renardeinside Sep 7, 2023
aded05b
backport the secret scope logic into applicator
renardeinside Sep 7, 2023
ce0d755
fix abstract property methods
renardeinside Sep 7, 2023
569cca9
fix first chunk of tests
renardeinside Sep 7, 2023
51e953b
apply replace instead of deepcopy
renardeinside Sep 7, 2023
bd241f8
refactor tests
renardeinside Sep 7, 2023
7b8c95a
improve coverage
renardeinside Sep 7, 2023
5a4b4f4
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 8, 2023
315ce48
add logical objects doc
renardeinside Sep 8, 2023
1b6386b
add logical objects doc
renardeinside Sep 8, 2023
9a039ec
add basic impl
renardeinside Sep 8, 2023
f978d73
add permissions support
renardeinside Sep 8, 2023
3524a99
add sql permissions support'
renardeinside Sep 8, 2023
afe8cb0
add relevance method check
renardeinside Sep 8, 2023
0fa1f6b
add relevance check impls
renardeinside Sep 8, 2023
75ba5d1
add comments
renardeinside Sep 8, 2023
fab8992
add passwords and token support
renardeinside Sep 8, 2023
8a19eb5
finish impls
renardeinside Sep 10, 2023
b6938e5
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 10, 2023
0a5d9ed
split supports into a package
renardeinside Sep 10, 2023
2e78d5b
remove unused functions from permissions manager
renardeinside Sep 11, 2023
e45b3f3
change table schema
renardeinside Sep 11, 2023
02bf874
remove unused tests
renardeinside Sep 11, 2023
cef9804
add test-cov to make
renardeinside Sep 11, 2023
4bf302a
remove unused types
renardeinside Sep 11, 2023
aa6344e
apply fmt
renardeinside Sep 11, 2023
8c181dc
add tests for types
renardeinside Sep 11, 2023
f7b349c
add test for utils
renardeinside Sep 11, 2023
d9a236a
add tests for permissions manager
renardeinside Sep 11, 2023
1a80b64
add explicit naming for supports
renardeinside Sep 11, 2023
13dc63c
fix usages of logical types
renardeinside Sep 11, 2023
f4d7de0
add full coverage for passwords
renardeinside Sep 11, 2023
0fb3bc2
add tests coverage for tokens
renardeinside Sep 11, 2023
73025aa
switch towards listing-based logic
renardeinside Sep 11, 2023
5bed8a6
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 11, 2023
fa8a125
merge with main
renardeinside Sep 11, 2023
75d87b6
improve docs
renardeinside Sep 11, 2023
18c2ca7
rename the package
renardeinside Sep 11, 2023
d616171
remove support from ignore
renardeinside Sep 11, 2023
042431a
refactor supports logic in permissions manager
renardeinside Sep 11, 2023
d472c18
type the verification code
renardeinside Sep 11, 2023
d335751
normalize tests
renardeinside Sep 11, 2023
2a0a15c
add tests for relevance checker
renardeinside Sep 11, 2023
15ca3ad
improve tests
renardeinside Sep 11, 2023
3b1faf1
Merge remote-tracking branch 'origin/main' into feature/sql-object-pe…
renardeinside Sep 11, 2023
ac5540d
fix issues with .gitignore
renardeinside Sep 11, 2023
966609e
address comments
renardeinside Sep 11, 2023
3396dff
add tests for secret scope
renardeinside Sep 11, 2023
f6fc978
full test coverage for secrets
renardeinside Sep 11, 2023
a86e628
full test coverage for permissions
renardeinside Sep 11, 2023
7661562
add basic tests for scim
renardeinside Sep 11, 2023
6beab23
improve tests for scim
renardeinside Sep 11, 2023
e8e95fd
fully cover scim
renardeinside Sep 11, 2023
d3185af
align package name in test
renardeinside Sep 11, 2023
53f396d
add full coverage for sql permissions
renardeinside Sep 11, 2023
3a58e4a
add logging test
renardeinside Sep 11, 2023
5676574
add model listing tests
renardeinside Sep 11, 2023
610a42b
full coverage for support submodule
renardeinside Sep 11, 2023
21f3f7f
fix typing
renardeinside Sep 11, 2023
192e142
improve docs
renardeinside Sep 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ cython_debug/
# dev files and scratches
dev/cleanup.py

Support

.databricks
.vscode
.vscode

.python-version
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,7 @@ fmt:
hatch run lint:fmt

test:
hatch run unit:test
hatch run unit:test

test-cov:
hatch run unit:test-cov-report
184 changes: 184 additions & 0 deletions docs/logic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Permissions migration logic and data structures

On a very high-level, the permissions inventorization process is split into two steps:

1. collect all existing permissions into a persistent storage.
2. apply the collected permissions to the target resources.

The first step is performed by the `Crawler` and the second by the `Applier`.

Crawler and applier are intrinsically connected to each other due to SerDe (serialization/deserialization) logic.

We implement separate crawlers and applier for each supported resource type.

Please note that `table ACLs` logic is currently handled separately from the logic described in this document.

## Logical objects and relevant APIs


### Group level properties (uses SCIM API)

- [x] Entitlements (One of `workspace-access`, `databricks-sql-access`, `allow-cluster-create`, `allow-instance-pool-create`)
- [x] Roles (AWS only)

These are workspace-level properties that are not associated with any specific resource.

Additional info:

- object ID: `group_id`
- listing method: `ws.groups.list`
- get method: `ws.groups.get(group_id)`
- put method: `ws.groups.patch(group_id)`

### Compute infrastructure (uses Permissions API)

- [x] Clusters
- [x] Cluster policies
- [x] Instance pools
- [x] SQL warehouses

These are compute infrastructure resources that are associated with a specific workspace.

Additional info:

- object ID: `cluster_id`, `policy_id`, `instance_pool_id`, `id` (SQL warehouses)
- listing method: `ws.clusters.list`, `ws.cluster_policies.list`, `ws.instance_pools.list`, `ws.warehouses.list`
- get method: `ws.permissions.get(object_id, object_type)`
- put method: `ws.permissions.update(object_id, object_type)`
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`


### Workflows (uses Permissions API)

- [x] Jobs
- [x] Delta Live Tables

These are workflow resources that are associated with a specific workspace.

Additional info:

- object ID: `job_id`, `pipeline_id`
- listing method: `ws.jobs.list`, `ws.pipelines.list`
- get method: `ws.permissions.get(object_id, object_type)`
- put method: `ws.permissions.update(object_id, object_type)`
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`

### ML (uses Permissions API)

- [x] MLflow experiments
- [x] MLflow models

These are ML resources that are associated with a specific workspace.

Additional info:

- object ID: `experiment_id`, `id` (models)
- listing method: custom listing
- get method: `ws.permissions.get(object_id, object_type)`
- put method: `ws.permissions.update(object_id, object_type)`
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`


### SQL (uses SQL Permissions API)

- [x] Alerts
- [x] Dashboards
- [x] Queries

These are SQL resources that are associated with a specific workspace.

Additional info:

- object ID: `id`
- listing method: `ws.alerts.list`, `ws.dashboards.list`, `ws.queries.list`
- get method: `ws.dbsql_permissions.get`
- put method: `ws.dbsql_permissions.set`
- get response object type: `databricks.sdk.service.sql.GetResponse`
- Note that API has no support for UPDATE operation, only PUT (overwrite) is supported.

### Security (uses Permissions API)

- [x] Tokens
- [x] Passwords

These are security resources that are associated with a specific workspace.

Additional info:

- object ID: `tokens` (static value), `passwords` (static value)
- listing method: N/A
- get method: `ws.permissions.get(object_id, object_type)`
- put method: `ws.permissions.update(object_id, object_type)`
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`

### Workspace (uses Permissions API)

- [x] Notebooks
- [x] Directories
- [x] Repos
- [x] Files

These are workspace resources that are associated with a specific workspace.

Additional info:

- object ID: `object_id`
- listing method: custom listing
- get method: `ws.permissions.get(object_id, object_type)`
- put method: `ws.permissions.update(object_id, object_type)`
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`

### Secrets (uses Secrets API)

- [x] Secrets

These are secrets resources that are associated with a specific workspace.

Additional info:

- object ID: `scope_name`
- listing method: `ws.secrets.list_scopes()`
- get method: `ws.secrets.list_acls(scope_name)`
- put method: `ws.secrets.put_acl`


## Crawler and serialization logic

Crawlers are expected to return a list of callable functions that will be later used to get the permissions.
Each of these functions shall return a `PermissionInventoryItem` that should be serializable into a Delta Table.
The permission payload differs between different crawlers, therefore each crawler should implement a serialization
method.

## Applier and deserialization logic

Appliers are expected to accept a list of `PermissionInventoryItem` and generate a list of callables that will apply the
given permissions.
Each applier should implement a deserialization method that will convert the raw payload into a typed one.
Each permission item should have a crawler type associated with it, so that the applier can use the correct
deserialization method.

## Relevance identification

Since we save all objects into the permission table, we need to filter out the objects that are not relevant to the
current migration.
We do this inside the `applier`, by returning a `noop` callable if the object is not relevant to the current migration.

## Crawling the permissions

To crawl the permissions, we use the following logic:
1. Go through the list of all crawlers.
2. Get the list of all objects of the given type.
3. For each callable, generate a callable that will return a `PermissionInventoryItem`.
4. Execute the callables in parallel
5. Collect the results into a list of `PermissionInventoryItem`.
6. Save the list of `PermissionInventoryItem` into a Delta Table.

## Applying the permissions

To apply the permissions, we use the following logic:

1. Read the Delta Table with raw permissions.
2. Map the items to the relevant applier. If no relevant applier is found, we raise an exception.
3. Deserialize the items using the relevant applier.
4. Generate a list of callables that will apply the permissions.
5. Execute the callables in parallel.
8 changes: 6 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,12 @@ known-first-party = ["databricks.labs.ucx"]
ban-relative-imports = "all"

[tool.ruff.per-file-ignores]
# Tests can use magic values, assertions, and relative imports
"tests/**/*" = ["PLR2004", "S101", "TID252"]

"tests/**/*" = [
"PLR2004", "S101", "TID252", # tests can use magic values, assertions, and relative imports
"ARG001" # tests may not use the provided fixtures
]

"src/databricks/labs/ucx/providers/mixins/redash.py" = ["A002", "A003", "N815"]

[tool.coverage.run]
Expand Down
Loading