Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add E2SC to imbalanced-learn #1090

Open
waashk opened this issue Aug 9, 2024 · 1 comment
Open

[ENH] Add E2SC to imbalanced-learn #1090

waashk opened this issue Aug 9, 2024 · 1 comment

Comments

@waashk
Copy link

waashk commented Aug 9, 2024

I'm opening this issue to propose the inclusion of the E2SC [1] method in the imbalanced-learn library. As the main author of this strategy, I believe that integrating E2SC will offer significant value to users dealing with imbalanced datasets.

Background:

The E2SC [1] method is currently considered state-of-the-art (SOTA) in the field of Instance Selection (IS). Although IS and undersampling have different primary objectives, they are related techniques as both aim to select a subset of representative data from larger datasets. The E2SC method was demonstrated to be particularly effective in addressing class imbalance issues by selecting the most informative instances, improving model performance through extensive experimentation.

Regarding the implementation details, I have already implemented the E2SC method in a separate repository, ensuring compatibility with both the scikit-learn and imbalanced-learn libraries, under the MIT license, promoting open-source collaboration and integration.

Describe the solution you'd like

Inclusion of the E2SC in the imbalanced-learn library. I would be happy to do this. I will make a PR referencing this issue soon. Please let me know if there is any additional information I should consider before proceeding.

Thank you for considering this enhancement. I look forward to the possibility of collaborating and contributing to the imbalanced-learn community.

[1] Cunha, Washington, et al. "An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification." Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023.

@lucasgsfelix
Copy link

It would be really interesting to see your solution on the library. I use the imblearn constantly, and by reading the paper, it looks like it could have a better efficiency than some of the methods that are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants