Add BaseIndependentSampler Mixin #1395
Labels
feature:sampling
Related to generating synthetic data after a model is built
internal
The issue doesn't change the API or functionality
Milestone
Problem Description
There is not currently a unified abstraction for multi table synthesizers. This has led to diverging implementations for
HMASynthesizer
andHSASynthesizer
. Additionally it leads to the following issues:As a solution, we propose creating mixins that unify multi table sampling based on two strategies:
This issue focuses on the Independent strategy.
Expected behavior
Add a new mixin called
IndependentSampler
to thesdv.sampling
module.Methods
These methods should all have an implementation in the base mixin itself.
_sample(self, scale=1.0)
pandas.DataFrame
._sample_table(self, table_name, num_rows, sampled_data=None)
_connect_tables(sampled_data)
_finalize(sampled_data)
Abstract methods
These methods do not need to be implemented here but any class that uses this mixin must implement them.
_add_foreign_key_column(child_table_rows, parent_table_rows, child_name, parent_name)
: This method should add a column for the foreign key that connects the child to the parent. It can use whatever logic it needs to figure out which value to use for that parent row.Additional context
The text was updated successfully, but these errors were encountered: