-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove handling_strategy parameter #843
Merged
Merged
Changes from 10 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
7e36919
changing logic for handling constraints
amontanez24 e2af95a
fixing a lot of tests
amontanez24 4718be5
removing handling_strategy attribute
amontanez24 94445ca
removing handling_strategy from docs and code
amontanez24 a6163eb
fixing tests, docs and tutorials
amontanez24 9744cc3
adding unit tests
amontanez24 82f8380
only calling reverse_transform if custom constraint
amontanez24 6539c67
ignoring sre_parse during isort
78bd7cd
pr comments
90b6003
raising all errors except missing column erros
amontanez24 1ff251b
adding integration test and tracking if reverse transform should use …
amontanez24 7f3f74b
refactoring how constraint transforms happen
amontanez24 e163b7d
adding unit tests
amontanez24 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,61 +91,19 @@ class Constraint(metaclass=ConstraintMeta): | |
This class is not intended to be used directly and should rather be | ||
subclassed to create different types of constraints. | ||
|
||
If ``handling_strategy`` is passed with the value ``transform`` | ||
or ``reject_sampling``, the ``filter_valid`` or ``transform`` and | ||
``reverse_transform`` methods will be replaced respectively by a simple | ||
identity function. | ||
|
||
Attributes: | ||
constraint_columns (tuple[str]): | ||
The names of the columns used by this constraint. | ||
rebuild_columns (tuple[str]): | ||
The names of the columns that this constraint will rebuild during | ||
``reverse_transform``. | ||
Args: | ||
handling_strategy (str): | ||
How this Constraint should be handled, which can be ``transform``, | ||
``reject_sampling`` or ``all``. | ||
""" | ||
|
||
constraint_columns = () | ||
rebuild_columns = () | ||
_hyper_transformer = None | ||
|
||
def _identity(self, table_data): | ||
return table_data | ||
|
||
def _identity_with_validation(self, table_data): | ||
self._validate_data_on_constraint(table_data) | ||
return table_data | ||
|
||
def __init__(self, handling_strategy): | ||
if handling_strategy == 'transform': | ||
self.filter_valid = self._identity | ||
elif handling_strategy == 'reject_sampling': | ||
self.rebuild_columns = () | ||
self.transform = self._identity_with_validation | ||
self.reverse_transform = self._identity | ||
elif handling_strategy != 'all': | ||
raise ValueError('Unknown handling strategy: {}'.format(handling_strategy)) | ||
|
||
def _fit(self, table_data): | ||
del table_data | ||
|
||
def fit(self, table_data): | ||
"""Fit ``Constraint`` class to data. | ||
|
||
Args: | ||
table_data (pandas.DataFrame): | ||
Table data. | ||
""" | ||
self._fit(table_data) | ||
|
||
def _transform(self, table_data): | ||
return table_data | ||
|
||
def _validate_data_on_constraint(self, table_data): | ||
"""Make sure the given data is valid for the given constraints. | ||
def _validate_data_meets_constraint(self, table_data): | ||
"""Make sure the given data is valid for the constraint. | ||
|
||
Args: | ||
data (pandas.DataFrame): | ||
|
@@ -169,16 +127,37 @@ def _validate_data_on_constraint(self, table_data): | |
|
||
raise ConstraintsNotMetError(err_msg) | ||
|
||
def check_missing_columns(self, table_data): | ||
"""Check ``table_data`` for missing columns. | ||
def _fit(self, table_data): | ||
del table_data | ||
|
||
def fit(self, table_data): | ||
"""Fit ``Constraint`` class to data. | ||
|
||
Args: | ||
table_data (pandas.DataFrame): | ||
Table data. | ||
""" | ||
self._fit(table_data) | ||
self._validate_data_meets_constraint(table_data) | ||
|
||
def _transform(self, table_data): | ||
return table_data | ||
|
||
def _validate_all_columns_present(self, table_data): | ||
"""Validate that all required columns are in ``table_data``. | ||
|
||
Args: | ||
table_data (pandas.DataFrame): | ||
Table data. | ||
|
||
Raises: | ||
MissingConstraintColumnError: | ||
If the data is missing any columns needed for the constraint transformation, | ||
a ``MissingConstraintColumnError`` is raised. | ||
""" | ||
missing_columns = [col for col in self.constraint_columns if col not in table_data.columns] | ||
if missing_columns: | ||
raise MissingConstraintColumnError() | ||
raise MissingConstraintColumnError(missing_columns=missing_columns) | ||
|
||
def transform(self, table_data): | ||
"""Perform necessary transformations needed by constraint. | ||
|
@@ -188,7 +167,8 @@ def transform(self, table_data): | |
should overwrite the ``_transform`` method instead. This method raises a | ||
``MissingConstraintColumnError`` if the ``table_data`` is missing any columns | ||
needed to do the transformation. If columns are present, this method will call | ||
the ``_transform`` method. | ||
the ``_transform`` method. If ``_transform`` fails, the data will be returned | ||
unchanged. | ||
|
||
Args: | ||
table_data (pandas.DataFrame): | ||
|
@@ -198,8 +178,7 @@ def transform(self, table_data): | |
pandas.DataFrame: | ||
Input data unmodified. | ||
""" | ||
self._validate_data_on_constraint(table_data) | ||
self.check_missing_columns(table_data) | ||
self._validate_all_columns_present(table_data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can just copy paste the function logic here, instead of creating it separetely. The docstrings already describe that method either way, and it doesn't really have any test cases. |
||
return self._transform(table_data) | ||
|
||
def fit_transform(self, table_data): | ||
|
@@ -216,8 +195,15 @@ def fit_transform(self, table_data): | |
self.fit(table_data) | ||
return self.transform(table_data) | ||
|
||
def _reverse_transform(self, table_data): | ||
return table_data | ||
|
||
def reverse_transform(self, table_data): | ||
"""Identity method for completion. To be optionally overwritten by subclasses. | ||
"""Handle logic around reverse transforming constraints. | ||
|
||
If the ``transform`` method was skipped, then this method should be too. | ||
Otherwise attempt to reverse transform and if that fails, return the data | ||
unchanged to fall back on reject sampling. | ||
|
||
Args: | ||
table_data (pandas.DataFrame): | ||
|
@@ -227,7 +213,7 @@ def reverse_transform(self, table_data): | |
pandas.DataFrame: | ||
Input data unmodified. | ||
""" | ||
return table_data | ||
return self._reverse_transform(table_data) | ||
|
||
def is_valid(self, table_data): | ||
"""Say whether the given table rows are valid. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we need this. Couldn't we just use
pass
or I'm missing something ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, this is just from before