Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to reset random sampling #1130

Closed
amontanez24 opened this issue Dec 2, 2022 · 0 comments
Closed

Add ability to reset random sampling #1130

amontanez24 opened this issue Dec 2, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

As a user, it would be nice if I could get the same sampled results as a previous sample call. It would also be nice to control when since sometimes I want new sampled data. On top of this, the consistency should be for all columns.

Acceptance criteria

  • We want to change the way we handle randomness to make the above request possible. This means doing the following steps:
    • Remove the randomize_samples parameter from all sample calls
    • Set the seed for the underlying model on the initial call.
    • Add a method called reset_sampling that resets the seed for the model back to the original state, as well as resets that random state of the HyperTransformer.
    • We should add the same method to MultiTableSynthesizers as well. This would just loop through each SingleTableSynthesizer and call the method.

Expected behavior

from sdv.single_table import GaussianCopulaSynthesizer

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data_1 = synthesizer.sample(10)

synthesizer.reset_sampling()
synthetic_data_2 = synthesizer.sample(10)

synthetic_data_2 should be the same as synthetic_data_1

Additional context

  • We may need to add some logic to track the state of randomization for the underlying model. Currently, we have a method to set the random state on them. We should set the random state in the beginning and then let the model continue to use that state until reset_sampling is called
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants