-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate scikit-learn's set_output
method into TransactionEncoder
#1087
Conversation
- Added two new tests, `test_get_feature_names_out` and `test_set_output`. Passing these tests is a step towards the output of `TransactionEncoder` being formatted as a pandas.DataFramed by default.
- Added `get_feature_names_out` method to `TransactionEncoder` to expose the `set_output` method.
- Updated test to include more checks. It is now back in a failing state.
- Updated test_set_output docstring to be more explicit. - Added numpy assertion to check that the transformed output columns match the original columns_ attribute for test_set_output. - Added numpy assertion to check that the get_feature_names_out output match the original columns_ attribute for test_get_feature_names_out.
- Added logic similar to that in `sklearn.base.ClassNamePrefixFeaturesOutMixin` and `sklearn.base.OneToOneFeatureMixin` for the get_feature_names_out method.
- Updated the user guide to show both the get_feature_names_out method and the set_output method.
- Updated changelog to reflect new features.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
- Updated issue number.
- Updated issue number (again) to reflect the PR link instead of the issue link.
- Ran isort over imports to fix failing check in PR.
Thanks a lot for the PR, really appreciate it and hope to review it in the upcoming days! |
- Increased scikit-learn version to minimum required for set_output to work.
Sure thing! I wasn't sure what to put regarding newer version release dates so I bumped the patch number by one and set the release date to "TBD". Lmk if I should change anything. |
Hey @rasbt it looks like the pipeline checks keep failing for files unrelated to this PR. Should I log a separate issue and open a new PR to fix those too? |
No worries, it should be fixed now via #1089 (except for the transaction encoder test, but that's something we can address in this PR) |
Hm, I think the new failures could be due to the sklearn version bump |
I noticed that scikit-learn 1.1.3 is being installed in the github workflow. Can we bump it to 1.2.2 as that is required for |
Yes, please feel free to bump it up. We probably need to fix some other places that have not been adjusted for the most recent version though |
- Bumped scikit-learn version up to 1.2.2 to match requirements.txt.
- Bumped scikit-learn version up to 1.2.2 to match environment.yml and requirements.txt.
Running
Both result in the test passing. I'm in favor of the first option as it doesn't require any changes to the |
- Updated `test_inverse_transform` to passing state by removing conversion to numpy array.
Turns out |
- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: environment.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: requirements.txt - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044
Of course bumping the version results in more failed tests 🤦 I'm going to make a separate PR to handle the |
Arg sorry. Yeah, a sklearn bump was overdue but I recently didn't have the time to look into it since I wasn't using the affected features. If this is too much work, don't worry about it, I can understand if you want to drop this. I could revisit the version bump in the upcoming weeks then, address this, and then merge your PR once it's addressed. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1087 +/- ##
==========================================
+ Coverage 78.29% 78.32% +0.03%
==========================================
Files 196 196
Lines 11140 11157 +17
Branches 1404 1404
==========================================
+ Hits 8722 8739 +17
Misses 2200 2200
Partials 218 218 ☔ View full report in Codecov by Sentry. |
@rasbt looks like all checks passed. Ready to merge 😎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks so much for the efforts! And sorry again about the unit test hassles!
Code of Conduct
Description
This defines the :method:
get_feature_names_out
in :class:TransactionEncoder
to expose the :method:set_output.Related issues or pull requests
Fixes #1085
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend