Skip to content

Commit

Permalink
v0.3.3 changes into main (#64)
Browse files Browse the repository at this point in the history
-added support for prior mean encoding (regularised encodings)
-added support for weights to mean, median and mode imputers
-added classname() method to BaseTransformer and prefixed all errors with classname call for easier debugging
-added DatetimeInfoExtractor transformer
-added DatetimeSinusoidCalculator
-added TwoColumnOperatorTransformer
-added StringConcatenator
-added SetColumnDtype
-added waring to MappingTransformer for unexpected changes in dtype
-added new module tubular/comparison.py containing EqualityChecker
-added PCATransformer
-updated black version to 22.3.0 and flake8 version to 5.0.4 to fix compatibility issues
-removed kwargs argument from BaseTransfomer
-updated version to 0.3.3
-updated changelog and docs

Co-authored-by: Claire_Fromholz <33227678+ClaireF57@users.noreply.github.com>
Co-authored-by: Fromholz <claire.fromholz@allianz.com>
  • Loading branch information
3 people authored Jan 19, 2023
1 parent d122842 commit 0098637
Show file tree
Hide file tree
Showing 61 changed files with 7,886 additions and 366 deletions.
34 changes: 34 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,40 @@ Subsections for each version can be one of the following;

Each individual change should have a link to the pull request after the description of the change.

0.3.3 (2023-01-19)
------------------

Added
^^^^^
- added support for prior mean encoding (regularised encodings) `#46 <https://github.com/lvgig/tubular/pull/46>`_

- added support for weights to mean, median and mode imputers `#47 <https://github.com/lvgig/tubular/pull/47>`_

- added classname() method to BaseTransformer and prefixed all errors with classname call for easier debugging `#48 <https://github.com/lvgig/tubular/pull/48>`_

- added DatetimeInfoExtractor transformer in ``tubular/dates.py``associated tests with ``tests/dates/test_DatetimeInfoExtractor.py`` and examples with ``examples/dates/DatetimeInfoExtractor.ipynb`` `#49 <https://github.com/lvgig/tubular/pull/49>`_

- added DatetimeSinusoidCalculator in ``tubular/dates.py``associated tests with ``tests/dates/test_DatetimeSinusoidCalculator.py`` and examples with ``examples/dates/DatetimeSinusoidCalculator.ipynb`` `#50 <https://github.com/lvgig/tubular/pull/50>`_

- added TwoColumnOperatorTransformer in ``tubular/numeric.py``associated tests with ``tests/numeric/test_TwoColumnOperatorTransformer.py`` and examples with ``examples/dates/TwoColumnOperatorTransformer.ipynb`` `#51 <https://github.com/lvgig/tubular/pull/51>`_

- added StringConcatenator in ``tubular/strings.py``associated tests with ``tests/strings/test_StringConcatenator.py`` and examples with ``examples/strings/StringConcatenator.ipynb`` `#52 <https://github.com/lvgig/tubular/pull/52>`_

- added SetColumnDtype in ``tubular/misc.py``associated tests with ``tests/misc/test_StringConcatenator.py`` and examples with ``examples/strings/StringConcatenator.ipynb`` `#53 <https://github.com/lvgig/tubular/pull/53>`_

- added waring to MappingTransformer in ``tubular/mapping.py`` for unexpected changes in dtype `#54 <https://github.com/lvgig/tubular/pull/54>`_

- added new module ``tubular/comparison.py`` containing EqualityChecker. Also added associated tests with ``tests/comparison/test_EqualityChecker.py`` and examples with ``examples/comparison/EqualityChecker.ipynb`` `#55 <https://github.com/lvgig/tubular/pull/55>`_

- added PCATransformer in ``tubular/numeric.py``associated tests with ``tests/misc/test_PCATransformer.py`` and examples with ``examples/numeric/PCATransformer.ipynb`` `#57 <https://github.com/lvgig/tubular/pull/57>`_

Fixed
^^^^^
- updated black version to 22.3.0 and flake8 version to 5.0.4 to fix compatibility issues `#45 <https://github.com/lvgig/tubular/pull/45>`_

- removed **kwargs argument from BaseTransfomer in ``tubular/base.py``to avoid silent erroring if incorrect arguments passed to transformers. Fixed a few tests which were revealed to have incorrect arguments passed by change `#56 <https://github.com/lvgig/tubular/pull/56>`_

0.3.2 (2022-01-13)
------------------

Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ General
^^^^^^^

- Please try and keep each pull request to one change or feature only
- Make sure to update the `changelog <https://github.com/lvgig/test-aide/blob/main/CHANGELOG.rst>`_ with details of your change
- Make sure to update the `changelog <https://github.com/lvgig/tubular/blob/main/CHANGELOG.rst>`_ with details of your change

Code formatting
^^^^^^^^^^^^^^^
Expand Down
16 changes: 15 additions & 1 deletion docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@ capping module

capping.CappingTransformer
capping.OutOfRangeNullTransformer

comparison module
------------------

.. autosummary::
:toctree: api/

comparison.EqualityChecker

dates module
------------------
Expand All @@ -32,6 +40,8 @@ dates module
dates.DateDiffLeapYearTransformer
dates.SeriesDtMethodTransformer
dates.ToDatetimeTransformer
dates.DatetimeInfoExtractor
dates.DatetimeSinusoidCalculator

imputers module
------------------
Expand Down Expand Up @@ -67,6 +77,7 @@ misc module
:toctree: api/

misc.SetValueTransformer
misc.SetColumnDtype

nominal module
------------------
Expand All @@ -88,9 +99,11 @@ numeric module
:toctree: api/

numeric.LogTransformer
numeric.CutTransformer
numeric.CutTransformer
numeric.TwoColumnOperatorTransformer
numeric.ScalingTransformer
numeric.InteractionTransformer
numeric.PCATransformer

strings module
------------------
Expand All @@ -99,3 +112,4 @@ strings module
:toctree: api/

strings.SeriesStrMethodTransformer
strings.StringConcatenator
38 changes: 27 additions & 11 deletions docs/source/quick-start.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
Quick Start
====================
|logo|

Welcome to the quick start guide for |logo| !
Welcome to the quick start guide for tubular!

.. |logo| image:: ../../logo.png
:height: 50px
:height: 200px

Installation
--------------------
Expand All @@ -15,7 +16,6 @@ The easiest way to get ``tubular`` is to install directly from ``pypi``;
pip install tubular
.. important::
Thanks for installing tubular! We hope you find it useful!
Expand Down Expand Up @@ -54,20 +54,24 @@ The standard `OutOfRangeNullTransformer <https://tubular.readthedocs.io/en/lates
Dates
^^^^^

This module contains transformers to deal with date columns.
This module contains transformers to deal with datetime columns.

Date differencing is available - accounting for leap years (`DateDiffLeapYearTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.DateDiffLeapYearTransformer.html>`_) or not (`DateDifferenceTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.DateDifferenceTransformer.html>`_).
Date differencing is available - accounting for leap years `DateDiffLeapYearTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.DateDiffLeapYearTransformer.html>`_ or not `DateDifferenceTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.DateDifferenceTransformer.html>`_.

The `BetweenDatesTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.BetweenDatesTransformer.html>`_ calculates if one date falls between two others.

The `ToDatetimeTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.ToDatetimeTransformer.html>`_ converts columns to datetime type.

Finally the `SeriesDtMethodTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.SeriesDtMethodTransformer.html>`_ allows the user to use `pandas.Series.dt <https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.html>`_ methods in a similar way to `base.DataFrameMethodTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.base.DataFrameMethodTransformer.html>`_.
The `SeriesDtMethodTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.dates.SeriesDtMethodTransformer.html>`_ allows the user to use `pandas.Series.dt <https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.html>`_ methods in a similar way to `base.DataFrameMethodTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.base.DataFrameMethodTransformer.html>`_.

The `DatetimeInfoExtractor <https://tubular.readthedocs.io/en/latest/api/tubular.dates.DatetimeInfoExtractor.html>`_ allows the user to extract datetime info such as the time of day or month from a datetime field.

The `DatetimeSinusoidCalculator <https://tubular.readthedocs.io/en/latest/api/tubular.dates.DatetimeSinusoidCalculator.html>`_ derives a feature in a dataframe by calculating the sine or cosine of a datetime column.

Imputers
^^^^^^^^

This module contains standard imputation techniques - mean, median mode as well as `NearestMeanResponseImputer <https://tubular.readthedocs.io/en/feature-version_0_3_0/api/tubular.imputers.NearestMeanResponseImputer.html>`_ which imputes with the value which is closest to the ``null`` values in terms of average response.
This module contains standard imputation techniques - mean, median mode as well as `NearestMeanResponseImputer <https://tubular.readthedocs.io/en/feature-version_0_3_0/api/tubular.imputers.NearestMeanResponseImputer.html>`_ which imputes with the value which is closest to the ``null`` values in terms of average response. All of these support weights.

The `NullIndicator <https://tubular.readthedocs.io/en/feature-version_0_3_0/api/tubular.imputers.NullIndicator.html>`_ is used to create binary indicators of where ``null`` values are present in a column.

Expand All @@ -83,24 +87,36 @@ The `CrossColumnMappingTransformer <https://tubular.readthedocs.io/en/latest/api
Misc
^^^^

The misc module currently contains only one transformer, `SetValueTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.misc.SetValueTransformer.html>`_, which creates a constant column with arbitrary value.
The misc module contains transformers which do not fit into other categories.

`SetValueTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.misc.SetValueTransformer.html>`_ creates a constant column with arbitrary value.

`SetDtype <https://tubular.readthedocs.io/en/latest/api/tubular.misc.SetDtype.html>`_ allows the user to set the dtype of a column.

Nominal
^^^^^^^

This module contains categorical encoding techniques.

There are respone encoding techniques such as `MeanResponseTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.nominal.MeanResponseTransformer.html>`_, one hot encoding (`OneHotEncodingTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.nominal.OneHotEncodingTransformer.html>`_) and grouping of infrequently occuring levels (`GroupRareLevelsTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.nominal.GroupRareLevelsTransformer.html>`_).
There are respone encoding techniques such as `MeanResponseTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.nominal.MeanResponseTransformer.html>`_, one hot encoding `OneHotEncodingTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.nominal.OneHotEncodingTransformer.html>`_ and grouping of infrequently occuring levels `GroupRareLevelsTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.nominal.GroupRareLevelsTransformer.html>`_.

`MeanResponseTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.nominal.MeanResponseTransformer.html>`_ also supports regularisation of encodings using a prior.

Numeric
^^^^^^^

This module contains numeric transformations - cut (`CutTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.CutTransformer.html>`_), log (`LogTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.LogTransformer.html>`_) and scaling (`ScalingTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.ScalingTransformer.html>`_).
This module contains numeric transformations - cut `CutTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.CutTransformer.html>`_, log `LogTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.LogTransformer.html>`_, and scaling `ScalingTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.ScalingTransformer.html>`_.

`TwoColumnOperatorTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.TwoColumnOperatorTransformer.html>`_ allows a user to apply operations to two colmns using methods from `pandas.DataFrame method <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_ which require a multiple columns (e.g. add, subtract, multiply etc

It also contains `InteractionTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.InteractionTransformer.html>`_ and `PCATransformer <https://tubular.readthedocs.io/en/latest/api/tubular.numeric.PCATransformer.html>`_ which create interaction terms and pca components.

Strings
^^^^^^^

The strings module contains a single transformer, `SeriesStrMethodTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.strings.SeriesStrMethodTransformer.html>`_, that allows the user to access `pandas.Series.str <https://pandas.pydata.org/docs/reference/api/pandas.Series.str.html>`_ methods within ``tubular``.
The strings module contains useful transformers for working with strings. `SeriesStrMethodTransformer <https://tubular.readthedocs.io/en/latest/api/tubular.strings.SeriesStrMethodTransformer.html>`_, allows the user to access `pandas.Series.str <https://pandas.pydata.org/docs/reference/api/pandas.Series.str.html>`_ methods within ``tubular``. `StringConcatenator <https://tubular.readthedocs.io/en/latest/api/tubular.strings.StringConcatenator.html>`_ allows a user to concatenate multiple columns together of varied dtype into a string output.



Reporting an issue
---------------------------------
Expand Down
Loading

0 comments on commit 0098637

Please sign in to comment.