Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a utility to convert old --> new metadata format #966

Closed
npatki opened this issue Aug 19, 2022 · 0 comments
Closed

Create a utility to convert old --> new metadata format #966

npatki opened this issue Aug 19, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Aug 19, 2022

Problem Description

Since SDV 1.0 will use a slightly modified metadata format, it can be time-consuming to convert preexisting metadata files manually.

Expected behavior

Create utility methods that would convert metadata in the old format to valid metadata in the new format. Add a method called upgrade_metadata to each class that takes in the old metadata file and writes a new one.

from sdv.metadata import SingleTableMetadata

SingleTableMetadata.upgrade_metadata(
  old_filepath='old_metadata.json',
  new_filepath='new_metadata.json'
)

# same method would exist for MultiTableMetadata

Cases

Filepaths: The file in old_filepath must be found and there must be no file already existing in new_filepath.

SingleTableMetadata.upgrade_metadata(
  old_filepath='data/incorrect_path.json',
  new_filepath='new_metadata.json'
)
Error: No metadata file found at old filepath 'data/incorrect_path.json'

SingleTableMetadata.upgrade_metadata(
  old_filepath='old_metadata.json',
  new_filepath='old_metadata.json'
)
Error: A file already exists at path 'old_metadata.json'. Please specify a new filepath.

Metadata Validity. The user's old metadata may not be valid. After writing the new file, do the validate() check and if anything is found, throw a warning with the details.

from sdv.metadata import SingleTableMetadata

SingleTableMetadata.upgrade_metadata(
  old_filepath='old_metadata.json',
  new_filepath='new_metadata.json'
)

Warning: Successfully converted the old metadata, but the metadata was not valid.
To use this with the SDV, please fix the following errors.

InvalidMetadataError: The metadata is not valid

Error: Invalid values ("pii") for datetime column "start_date".
Error: Invalid regex format string "[A-{6}" for text column "user_id"
...

Single & Sequential Metadata. Some users may have written a single table or sequential metadata file using the multi-table format. As in, they have nested the (single) table under the "tables" keyword. We should properly convert the metadata in this case.

{
    "tables": {
        "my_table": { <table metadata> }
    }
}

However, if there are multiple table specified, then throw an error.

Error: There are multiple tables specified in the JSON.
Try using the MultiTableMetadata class to upgrade this file.

Conversion Error: Do not write the file if there are any others errors in parsing or converting the old metadata. Let whatever error occurs be raised.

@npatki npatki added the feature request Request for a new feature label Aug 19, 2022
@npatki npatki added this to the 1.0.0 milestone Aug 19, 2022
@amontanez24 amontanez24 self-assigned this Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants