Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unexpected arguments are passed silently without raising errors #43

Closed
1 task done
cagonza6 opened this issue Nov 10, 2022 · 4 comments
Closed
1 task done
Labels
bug Something isn't working

Comments

@cagonza6
Copy link

What happened?

While using the 'nominal.py::GroupRareLevelsTransformer' we had the problem that when the 'columns' argument is not passed, the transformer takes automatically all columns of type 'object 'or 'category', which is fine. However, if for some reason there was a typo in the 'columns' attribute (for instance 'coulumns'), the transformer does not raise any error regarding unknown/unexpected argument, but rather goes with the automatic behaviour explained above. This generate error going silent if anything happens and can generate quite nasty unexpected behaviours.

While looking a the parent classes, one can see that its parents do not expect any other column.

  • BaseNominalTransformer: does not defines an init, so parameters go to parents
  • BaseTransformer: it does expect argument 'columns' among others, further arguments go to the parents
  • TransformerMixin: does not defines an init, no parent
  • BaseEstimator: does not defines an init, no parent

That means that one could stop the bug from happening, if the argument '**kwargs' is removed from the class 'BaseTransformer' which is the last in the inheritance that accepts some arguments, also it does not cal the init of the super at any moment. This would cause that all unexpected arguments would raise errors. https://github.com/lvgig/tubular/blob/f54424948608a1d8cc489aeefaef75ce7a602c53/tubular/base.py#L15

That would solve the problem with the 'GroupRareLevelsTransformer'. However, it would make several other tests fail, since there are many calls to some classes where non expected arguments are passed. For instance, in https://github.com/lvgig/tubular/blob/f54424948608a1d8cc489aeefaef75ce7a602c53/tests/test_transformers.py#L49 there is the call to 'NearestMeanResponseImputer' with an argument that it dies not expect: 'response_column'.
After fixing that, there are other 19 test that fail, since several tests pass unexpected arguments at some moment or another.

Environment

Python 3.9.5

pytest==7.1.2
pandas>=1.0.0
scikit-learn>=1.1.1
tubular == 0.3.2

Minimum reproducible code

# This code should raise an error, note the argument 'im_not_expected', does not break the code.

from tubular.nominal import GroupRareLevelsTransformer

def test_tubular_args():
    with pytest.raises(TypeError) as excinfo:
        GroupRareLevelsTransformer(im_not_expected=['a', 'b', 'c'])

Relevant error output

FAILED test_tubular_args.py::test_tubular_args - Failed: DID NOT RAISE <class 'TypeError'>

Code of Conduct

  • I agree to follow this project's Code of Conduct
@cagonza6 cagonza6 added the bug Something isn't working label Nov 10, 2022
@cagonza6
Copy link
Author

BTW: I could open a PR with the changes and fixes to unit tests, but I would prefer to have some discussion first, since the change are potentially long.

@davidhopkinson26
Copy link
Contributor

Hi Cristian, thank you very much for raising and for your proposed solution!

I agree that getting an error for unexpected keywords is desirable behaviour. and that removing **kwargs from base.BaseTransformer.__ init__ would be the best fix here. There are no args to pass on to TransformerMixin or BaseEstimator and sklearn.BaseEstimator in fact has a note saying children should specify all the parameters that can be set in their init as explicit keyword arguments. There is no good reason I can see to keep **kwargs in here.

Having had a look myself it appears that the failing tests resulting from this change would be due to typos or out of date arguments being used in a few tests, which need fixing regardless.

Please do go ahead and open a PR with the changes proposed.

@cagonza6
Copy link
Author

PR made, I have two questions in the PR, though. @davidhopkinson26, if you could check it would be great.

PR: #44

davidhopkinson26 added a commit to davidhopkinson26/tubular that referenced this issue Dec 14, 2022
davidhopkinson26 added a commit that referenced this issue Jan 9, 2023
* removed **kwargs arg from BaseTransformer.__init__

* added test for unexpected kwarg handling to test_BaseTransformer.py
added similar test to example child class test module
test_DataFrameMethodTransformer.py

* fix broken tests with incorrect kwargs
@davidhopkinson26
Copy link
Contributor

this issue has been fixed in #56 so I will close the issue for now. Thanks for identifying the issue and how to fix @cagonza6!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants