You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, it would be useful to be able to preprocess my data in a separate step from modeling. It would also be helpful to do this from the multi-table level.
Acceptance criteria
Add the following methods:
preprocess(data)
data is a dictionary mapping each table name to a pandas.dataFrame
This method should essentially loop through the single table synthesizers for each table and call preprocess on them with the proper data
It should return a dictionary mapping each table name to the transformed data
This method can be added to the BaseMultiTableSynthesizer
It should only raise one warning if any of the synthesizers have been fit. The warning should read: Warning: This synthesizer has already been fit. To use the new preprocessed data, please refit the synthesizer using 'fit' or 'fit_processed_data'
fit_processed_data(processed_data)
processed_data is a dictionary mapping each table name to a pandas.dataFrame. This data should have already been ran through he data processor.
This method will be specific to each MultiTableSynthesizer, so for now only needs to be implemented in the HMASynthesizer.
fit(data)
data is a dictionary mapping each table name to a pandas.dataFrame
should call preprocess and then fit_processed_data
Expected behavior
preprocess
This method should essentially loop through each table and call SingleTableSynthesizer.preprocess with the correct data
fit_processed_data(processed_data)
This is where the current HMA algorithm should take place. Each child table should be modeled and then the parameters for that model should be used to extend the table of the parent until eventually the parent is modeled. The code in hma should be reviewed as influence.
Additional context
It is a requirement that the primary keys be available to the MultiTableSynthesizer before it fits the models. This should be satisfied as the DataProcessor now makes the primary key the index during transform
There is a slight change in the workflow from what happens in hma. We now transform each table first, and then will be calling the fit method for each model and extending the tables with model parameters of the child table.
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, it would be useful to be able to preprocess my data in a separate step from modeling. It would also be helpful to do this from the multi-table level.
Acceptance criteria
Add the following methods:
preprocess(data)
data
is a dictionary mapping each table name to a pandas.dataFrameBaseMultiTableSynthesizer
Warning: This synthesizer has already been fit. To use the new preprocessed data, please refit the synthesizer using 'fit' or 'fit_processed_data'
fit_processed_data(processed_data)
processed_data
is a dictionary mapping each table name to a pandas.dataFrame. This data should have already been ran through he data processor.MultiTableSynthesizer
, so for now only needs to be implemented in theHMASynthesizer
.fit(data)
data
is a dictionary mapping each table name to a pandas.dataFramepreprocess
and thenfit_processed_data
Expected behavior
preprocess
SingleTableSynthesizer.preprocess
with the correct datafit_processed_data(processed_data)
HMA
algorithm should take place. Each child table should be modeled and then the parameters for that model should be used to extend the table of the parent until eventually the parent is modeled. The code in hma should be reviewed as influence.Additional context
MultiTableSynthesizer
before it fits the models. This should be satisfied as theDataProcessor
now makes the primary key the index duringtransform
The text was updated successfully, but these errors were encountered: