Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document preparing external locations when creating catalogs #2915

Merged
merged 3 commits into from
Oct 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Databricks Labs UCX
===
![UCX by Databricks Labs](docs/logo-no-background.png)

The companion for upgrading to Unity Catalog.
The companion for upgrading to Unity Catalog (UC).

After [installation](#install-ucx), ensure to [trigger](#ensure-assessment-run-command) the [assessment workflow](#assessment-workflow),
so that you'll be able to [scope the migration](docs/assessment.md) and execute the [group migration workflow](#group-migration-workflow).
Expand Down Expand Up @@ -563,10 +563,18 @@ Once the upgrade is completed, these principals can (and should) be deleted.
Use the `create-uber-principal` [UCX Command](#create-uber-principal-command) to create the Uber Principal.

##### Step 2.5: Create Catalogs and Schemas
In this step we will create the UC catalogs and schemas required for the target tables.
The `create-catalogs-schemas` [UCX command](#create-catalogs-schemas-command) can be used to create the UC catalogs and schemas.

The command will create the UC catalogs and schemas based on the mapping file created in the previous step.
In this step, we will create the UC catalogs and schemas required for the target tables using the
[`create-catalogs-schemas` command](#create-catalogs-schemas-command). The command will create the UC catalogs and
schemas based on the mapping file created in the previous step.

This step requires considering how to [physically separate data in storage](https://docs.databricks.com/en/data-governance/unity-catalog/best-practices.html#data-is-physically-separated-in-storage)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biswadeepupadhyay-db: Does this clarify what you were missing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so as long as we are not going ahead with the default metastore path (used at the time of UC setup) it is recommended that we create external storage and storage credential for the metastore paths we aim to use for our catalogs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The later is generally advised by Databricks (according to the linked docs).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please confirm if step 2.1 till step 2.3 are related to finding the external locations from the assessment related to hive external tables and creating corresponding external location and storage credentials related to those external paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct

within UC. As [Databricks recommends storing managed data at the catalog level](https://docs.databricks.com/en/data-governance/unity-catalog/best-practices.html#configure-a-unity-catalog-metastore),
we advise to prepare the external locations for the to-be created catalogs before running the `create-catalogs-schemas`
command. Either, reuse [previously created external locations](#step-23-create-external-locations) or create additional
external locations outside of UCX if data separation restrictions requires that. Note that external locations can be
reused when using subpaths, for example, a folder in a cloud storage
(`abfss://container@storage.dfs.core.windows.net/folder`) can reuse the external location of the cloud storage
(`abfss://container@storage.dfs.core.windows.net/`). (The previous example also holds for other clouds.)

#### Step 3: Upgrade the Metastore
Upgrading the metastore is done in steps.
Expand Down