Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log if any numerical_distributions will not be applied #1212

Closed
npatki opened this issue Jan 27, 2023 · 0 comments
Closed

Log if any numerical_distributions will not be applied #1212

npatki opened this issue Jan 27, 2023 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jan 27, 2023

Nice to have. We can remove this from the milestone if needed.

Problem Description

In SDV 1.0, I will be able to specify numerical distributions for any original column name:

synthesizer = GaussianCopulaSynthesizer(
  metadata,
  numerical_distributions={
    'age': 'uniform',
    'start_date': 'beta',
    'end_date': 'beta',
  }
)

However (due to constraints and transformations), the column may no longer exist once we hit the fitting stage -- meaning that my specified numerical distribution won't be applied. For example, if I apply an Inequality constraint between start_date and end_date, then start_date may be dropped.

Expected behavior

It would be nice to log that this is happening (logger.info):

After preprocess, we should have a new list of column names. If any user-provided name in numerical_distributions is no longer available, then log it during the fit_processed_data method:

INFO: Requested distribution ('beta') cannot be applied to column ('start_date')
because it no longer exists after preprocessing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants