Skip to content

Commit

Permalink
Split GreaterThan into Inequality and ScalarInequality (#823)
Browse files Browse the repository at this point in the history
* Add Inequality + update docstring + remove as_list

* Set up __init__

* Finished _fit

* Finish Inequality class

* Finish ScalarInequality

* Update all occurances of GreaterThan, except the boss file...

* Update a lot of test cases

* Another batch done. Only 1 remaining :)

* Finished Inequality tests, only ScalarInequality remaining

* Finished :), now let's start a 30hr debugging session :)

* Positive/Negative completely done, onto Inequality now

* Only transform/reverse remaining

* ONLY 2 REMAINING :)

* fix lint

* Fix integration tests

* Add handling and column_model back

* .

* Fix rounding

* Change tutorial

* add ~ in docs

* fixed feedback

* fix docs

* add test cases

* Fix rebase

* Fix rebase

* Fix tests

* Fix rebase

* Fix bug where scaralinequality didnt reverse_transform properly

* Fix typo
  • Loading branch information
fealho authored Jun 9, 2022
1 parent 9932451 commit e4da71e
Show file tree
Hide file tree
Showing 13 changed files with 887 additions and 2,477 deletions.
48 changes: 32 additions & 16 deletions docs/api_reference/constraints/tabular.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ CustomConstraint
CustomConstraint.to_dict

FixedCombinations
~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/
Expand All @@ -37,24 +37,40 @@ FixedCombinations
FixedCombinations.from_dict
FixedCombinations.to_dict

GreaterThan
Inequality
~~~~~~~~~~

.. autosummary::
:toctree: api/

Inequality
Inequality.fit
Inequality.transform
Inequality.fit_transform
Inequality.reverse_transform
Inequality.is_valid
Inequality.filter_valid
Inequality.from_dict
Inequality.to_dict

ScalarInequality
~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/

GreaterThan
GreaterThan.fit
GreaterThan.transform
GreaterThan.fit_transform
GreaterThan.reverse_transform
GreaterThan.is_valid
GreaterThan.filter_valid
GreaterThan.from_dict
GreaterThan.to_dict
ScalarInequality
ScalarInequality.fit
ScalarInequality.transform
ScalarInequality.fit_transform
ScalarInequality.reverse_transform
ScalarInequality.is_valid
ScalarInequality.filter_valid
ScalarInequality.from_dict
ScalarInequality.to_dict

Positive
~~~~~~~~~~~~~~~~
~~~~~~~~

.. autosummary::
:toctree: api/
Expand All @@ -70,7 +86,7 @@ Positive
Positive.to_dict

Negative
~~~~~~~~~~~~~~~~
~~~~~~~~

.. autosummary::
:toctree: api/
Expand All @@ -86,7 +102,7 @@ Negative
Negative.to_dict

ColumnFormula
~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~

.. autosummary::
:toctree: api/
Expand Down Expand Up @@ -118,7 +134,7 @@ Between
Between.to_dict

OneHotEncoding
~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/
Expand All @@ -134,7 +150,7 @@ OneHotEncoding
OneHotEncoding.to_dict

Unique
~~~~~~~~~~~~~~~~
~~~~~~

.. autosummary::
:toctree: api/
Expand Down
78 changes: 22 additions & 56 deletions docs/user_guides/single_table/handling_constraints.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ To use this constraint, we must make an instance and provide:
)
FixedCombinations Constraint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The next constraint that we will explore is the ``FixedCombinations``
constraint.
Expand All @@ -162,97 +162,63 @@ order to use this constraint we will need to import it from the
handling_strategy='transform'
)
GreaterThan Constraint
~~~~~~~~~~~~~~~~~~~~~~
Inequality and ScalarInequality Constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The second constraint that we need for our data is the ``GreaterThan``
Another constraint that we need for our data is the ``Inequality``
constraint. This constraint guarantees that one column is always greater
than the other one. In order to use it, we need to create an instance
passing:

- the name of the ``low`` column
- the name of the ``high`` column
- the handling strategy that we want to use
- the ``low_column_name``
- the ``high_column_name``

.. ipython:: python
:okwarning:
from sdv.constraints import GreaterThan
from sdv.constraints import Inequality
age_gt_age_when_joined_constraint = GreaterThan(
low='age_when_joined',
high='age',
handling_strategy='reject_sampling'
age_gt_age_when_joined_constraint = Inequality(
low_column_name='age_when_joined',
high_column_name='age',
)
The ``GreaterThan`` constraint can also be used to guarantee a column is greater
The ``ScalarInequality`` constraint can be used to guarantee a column is greater
or lower than a scalar value or specific datetime value instead of another column.
To use this functionality, we can pass:

- the scalar value for ``low`` or the scalar value for ``high``
- a flag indicating whether ``low`` or ``high`` is a scalar
- the ``column_name``
- the scalar ``value``
- a flag indicating what the ``relation`` between them is (choose between ``'>'``, ``'>='``, ``'<'``, ``'<='``)

.. ipython:: python
:okwarning:
salary_gt_30000_constraint = GreaterThan(
low=30000,
high='salary',
scalar='low',
handling_strategy='reject_sampling'
salary_gt_30000_constraint = ScalarInequality(
column_name='salary',
value=30000,
relation='>',
)
.. note::
If you want to indicate that the column must be *lower than* a scalar value,
all you need to do is invert the arguments, pass the scalar value as the ``high``
argument, the column name as the ``low`` argument, and set the `scalar` flag to ``"high"``.

Optionally, when constructing ``GreaterThan`` constraint we can specify
more than a single column in either the ``high`` or ``low`` arguments.
For example, we can create a ``GreaterThan`` constraint that that ensures
that both the years in the company and prior years of experience is more
than one year.

.. ipython:: python
:okwarning:
experience_years_gt_one_constraint = GreaterThan(
low=1,
high=['years_in_the_company', 'prior_years_experience'],
scalar='low',
handling_strategy='reject_sampling'
)
.. warning::

Warning! Passing a list of columns to the `high` or `low` arguments is only possible
when the other one has been passed as a single column name or scalar value! If you need
to compare multiple ``high`` columns against multiple ``low`` columns (or vice versa),
you need to decompose one of the ends, ``high`` or ``low``, into multiple single column
names and define one ``GreaterThan`` constraint for each one of them.


Positive and Negative Constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similar to the ``GreaterThan`` constraint, we can use the ``Positive``
Similar to the ``ScalarInequality`` constraint, we can use the ``Positive``
or ``Negative`` constraints. These constraints enforce that the specified
column(s) are always positive or negative. We can create an instance passing:
column are always positive or negative. We can create an instance passing:

- the name of the column(s) for ``Negative`` or ``Positive`` constraints
- the name of the column for ``Negative`` or ``Positive`` constraints
- a boolean specifying whether to make the data strictly above or below 0,
or include 0 as a possible value
- the handling strategy that we want to use

.. ipython:: python
:okwarning:
from sdv.constraints import Positive
positive_age_constraint = Positive(
columns='age',
column_name='age',
strict=False,
handling_strategy='reject_sampling'
)
ColumnFormula Constraint
Expand Down
7 changes: 4 additions & 3 deletions sdv/constraints/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@

from sdv.constraints.base import Constraint
from sdv.constraints.tabular import (
Between, ColumnFormula, CustomConstraint, FixedCombinations, FixedIncrements, GreaterThan,
Negative, OneHotEncoding, Positive, Unique)
Between, ColumnFormula, CustomConstraint, FixedCombinations, FixedIncrements, Inequality,
Negative, OneHotEncoding, Positive, ScalarInequality, Unique)

__all__ = [
'Constraint',
'ColumnFormula',
'CustomConstraint',
'GreaterThan',
'Inequality',
'ScalarInequality',
'FixedCombinations',
'FixedIncrements',
'Between',
Expand Down
2 changes: 1 addition & 1 deletion sdv/constraints/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ class Constraint(metaclass=ConstraintMeta):
Attributes:
constraint_columns (tuple[str]):
The names of the columns used by this constraint.
rebuild_columns (typle[str]):
rebuild_columns (tuple[str]):
The names of the columns that this constraint will rebuild during
``reverse_transform``.
Args:
Expand Down
Loading

0 comments on commit e4da71e

Please sign in to comment.