Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PARSynthesizer #1055

Closed
amontanez24 opened this issue Oct 5, 2022 · 0 comments
Closed

Create PARSynthesizer #1055

amontanez24 opened this issue Oct 5, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Oct 5, 2022

Problem Description

The PAR model needs to be migrated to the new synthesizer structure.

Expected behavior

  • There should be a new module called sequential in sdv (the same level as single_table)
  • There should be a class called PARSynthesizer
  • The PARSynthesizer should have the following init parameters
    • metadata: A SingleTableMetadata object
    • enforce_min_max_values: A boolean describing whether or not to strictly enforce the observed min/max values in numerical columns
    • enforce_rounding: A boolean describing whether or not to round the synthetic data based on the input data
    • context_columns: A list of strings, representing the columns that do not vary in a sequence
    • context_synthesizer: A string with the name of the model to use the context values
      • (default) 'GaussianCopulaSynthesizer'
      • Available options: 'GaussianCopulaSynthesizer', 'CTGANSynthesizer', 'CopulaGANSynthesizer', 'TVAESynthesizer'
    • context_synthesizer_parameters: A dictionary that maps each parameter name to a parameter values. Refer to the context model for the parameters that are allowed.
    • segment_size
    • epochs
    • cuda
    • sample_size
    • verbose

Additional context

There are a few main changes from the current PAR model implementation

  1. The context_synthesizer now only accepts a string, but will work with any single-table synthesizer. It is now accompanied by the new context_synthesizer_parameters parameter which allows users to specify the configuration for the context_model.
  2. The PARSynthesizer inherits directly from the BaseSynthesizer class.
    • If there end up being too many differences, another option could be to just make PARSynthesizer not inherit from anything. It will have different sampling methods from the other synthesizers. Configuring the DataProcessor for it might also be challenging since it has to only transform the columns for the context model.
    • A third option could be to move the sampling methods to a BaseSingleTableSynthesizer class that inherits from the BaseSynthesizer
  3. The current BaseTimeseriesModel and PAR model can be combined into one class now since there aren't any other sequential models.
@amontanez24 amontanez24 added the feature request Request for a new feature label Oct 5, 2022
@amontanez24 amontanez24 added this to the 1.0.0 milestone Oct 5, 2022
@amontanez24 amontanez24 self-assigned this Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant