Add BaseHierarchicalSampler Mixin #1394
Labels
feature:sampling
Related to generating synthetic data after a model is built
internal
The issue doesn't change the API or functionality
Milestone
Problem Description
There is not currently a unified abstraction for multi table synthesizers. This has led to diverging implementations for
HMASynthesizer
andHSASynthesizer
. Additionally it leads to the following issues:As a solution, we propose creating mixins that unify multi table sampling based on two strategies:
This issue focuses on the Hierarchical strategy.
Expected behavior
Add a new mixin called
HierarchicalSampler
to thesdv.sampling
module.Methods
These methods should all have an implementation in the base mixin itself.
_sample(self, scale=1.0)
pandas.DataFrame
._sample_table(self, synthesizer, table_name, num_rows, sampled_data=None)
_add_child_rows(parent_row, child_name, sampled_data)
_finalize(sampled_data)
Abstract methods
These methods do not need to be implemented here but any class that uses this mixin must implement them.
_recreate_child_synthesizer(child_name, parent_row)
: This method should create aSingleTableSynthesizer
for the child based on the values in its parent's row._add_foreign_key_column(child_table, parent_table, child_name, parent_name)
: This method should add a column for the foreign key that connects the child to the parent. It can use whatever logic it needs to figure out which values to use for each child row.Additional context
This mixin follows the current HMA sampling process pretty closely. The main difference will be breaking the steps into the appropriate methods.
The goal is that at the end, to add a new hierarchical multi table synthesizer, the only methods needed will be one to link child rows to parent rows, and one to create a child synthesizer from a parent row.
The text was updated successfully, but these errors were encountered: