Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add functionality to create and delete catalogs, tables and schemas to Unity catalog client #20956

Merged
merged 10 commits into from
Jan 29, 2025

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Jan 28, 2025

Adds functionality to create / delete catalogs, tables and schemas, along with other drive-by improvements. Note, this does not contain functionality to actually write data to a table.

Changes

  • Updates type parsing to use the type_json field instead of type_text, as type_text was observed to have format inconsistencies across different data sources.

Breaking change from TypedDict to dataclass in function returns

This PR currently also updates some Python-side function returns to return dataclasses instead of (typed) dictionaries. This was mainly done so that I could add a get_polars_schema() to TableInfo as it wasn't possible to add methods to a TypedDict. It will be a breaking change compared to the API in the latest release (1.21.0) (i.e. you would have to change existing code from catalog_info['name'] to catalog_info.name etc.), but as it is unstable functionality it should be acceptable. I also think this is better to do now as it means we will be able to add functions later if we need to without having to break it then.

Other notes

It is a fairly large PR, if it helps for reviewing, this is a rough description of the sections of code when scrolling from top to bottom:

  • polars-io/src/catalog/schema.rs
    • Newly introduced type_json parsing
    • Newly added functions to convert from polars-native DataTypes to the Unity REST API format (used for creating tables)
  • polars-io/src/catalog/unity/client.rs
    • Newly added functions to the Rust-side catalog client (create_catalog etc.)
  • polars-io/src/catalog/unity/models.rs
    • Updated data models
  • polars-python/src/catalog/mod.rs
    • Updated to instantiate dataclasses to return as the results for some places
    • Newly added functions that serve to dispatch to the Rust-side catalog client (create_catalog etc.)
    • Factored out code to functions
  • py-polars/polars/catalog.py
    • Most of the changes are new function definitions and their docstrings, and updated data models (dataclasses)

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 28, 2025
@@ -46,7 +53,7 @@ def __init__(
* "databricks-sdk": Use the Databricks SDK to retrieve and use the
bearer token from the environment.
"""
from polars.polars import PyCatalogClient
issue_unstable_warning("`Catalog` functionality is considered unstable.")
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed calling issue_unstable_warning() before, currently on 1.21.0 the unstable warning is only in the docstring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. I don't think we need to warn per-se.

@nameexhaustion nameexhaustion changed the title feat: Add functions for creating catalogs / schemas / tables to Catalog feat: Add functionality to create and delete catalogs, tables and schemas to Unity catalog client Jan 28, 2025
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 7.16019% with 765 lines in your changes missing coverage. Please review.

Project coverage is 79.08%. Comparing base (dadad0d) to head (105f709).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-python/src/catalog/mod.rs 3.93% 293 Missing ⚠️
crates/polars-io/src/catalog/schema.rs 0.00% 246 Missing ⚠️
crates/polars-io/src/catalog/unity/client.rs 0.00% 140 Missing ⚠️
py-polars/polars/catalog.py 59.74% 31 Missing ⚠️
crates/polars-io/src/catalog/unity/utils.rs 0.00% 18 Missing ⚠️
crates/polars-io/src/catalog/unity/models.rs 0.00% 16 Missing ⚠️
crates/polars-utils/src/error.rs 0.00% 13 Missing ⚠️
crates/polars-utils/src/pl_str.rs 0.00% 6 Missing ⚠️
crates/polars-io/src/utils/other.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20956      +/-   ##
==========================================
- Coverage   79.28%   79.08%   -0.20%     
==========================================
  Files        1578     1580       +2     
  Lines      224195   224938     +743     
  Branches     2576     2576              
==========================================
+ Hits       177756   177898     +142     
- Misses      45848    46452     +604     
+ Partials      591      588       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nameexhaustion nameexhaustion marked this pull request as ready for review January 28, 2025 13:16
@nameexhaustion nameexhaustion marked this pull request as draft January 29, 2025 07:31
@nameexhaustion nameexhaustion marked this pull request as ready for review January 29, 2025 08:27
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only comment about the "schema" name. I'd like to propose "namespace" for that.

Ok(())
}

pub async fn create_schema(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that our schema conflicts with the catalog schema definition. Shall we name it create_namespace and mention in the docstrings that we mean catalog schema's for that?

Copy link
Collaborator Author

@nameexhaustion nameexhaustion Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update it. Do you also want it changed on the Python API, or just on the Rust-side functions?

The "schema" word is also used in multiple places (e.g. struct SchemaInfo, or schema_info: &str in function arguments) - if we want to be consistent we would need to rename all of those places.

@ritchie46 ritchie46 merged commit a2eb8aa into pola-rs:main Jan 29, 2025
27 of 28 checks passed
@nameexhaustion nameexhaustion added breaking python Change that breaks backwards compatibility for the Python package and removed breaking python Change that breaks backwards compatibility for the Python package labels Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants