You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm opening this issue to propose the inclusion of the E2SC [1] method in the imbalanced-learn library. As the main author of this strategy, I believe that integrating E2SC will offer significant value to users dealing with imbalanced datasets.
Background:
The E2SC [1] method is currently considered state-of-the-art (SOTA) in the field of Instance Selection (IS). Although IS and undersampling have different primary objectives, they are related techniques as both aim to select a subset of representative data from larger datasets. The E2SC method was demonstrated to be particularly effective in addressing class imbalance issues by selecting the most informative instances, improving model performance through extensive experimentation.
Regarding the implementation details, I have already implemented the E2SC method in a separate repository, ensuring compatibility with both the scikit-learn and imbalanced-learn libraries, under the MIT license, promoting open-source collaboration and integration.
Describe the solution you'd like
Inclusion of the E2SC in the imbalanced-learn library. I would be happy to do this. I will make a PR referencing this issue soon. Please let me know if there is any additional information I should consider before proceeding.
Thank you for considering this enhancement. I look forward to the possibility of collaborating and contributing to the imbalanced-learn community.
[1] Cunha, Washington, et al. "An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification." Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023.
The text was updated successfully, but these errors were encountered:
It would be really interesting to see your solution on the library. I use the imblearn constantly, and by reading the paper, it looks like it could have a better efficiency than some of the methods that are available.
I'm opening this issue to propose the inclusion of the E2SC [1] method in the imbalanced-learn library. As the main author of this strategy, I believe that integrating E2SC will offer significant value to users dealing with imbalanced datasets.
Background:
The E2SC [1] method is currently considered state-of-the-art (SOTA) in the field of Instance Selection (IS). Although IS and undersampling have different primary objectives, they are related techniques as both aim to select a subset of representative data from larger datasets. The E2SC method was demonstrated to be particularly effective in addressing class imbalance issues by selecting the most informative instances, improving model performance through extensive experimentation.
Regarding the implementation details, I have already implemented the E2SC method in a separate repository, ensuring compatibility with both the scikit-learn and imbalanced-learn libraries, under the MIT license, promoting open-source collaboration and integration.
Describe the solution you'd like
Inclusion of the E2SC in the imbalanced-learn library. I would be happy to do this. I will make a PR referencing this issue soon. Please let me know if there is any additional information I should consider before proceeding.
Thank you for considering this enhancement. I look forward to the possibility of collaborating and contributing to the imbalanced-learn community.
[1] Cunha, Washington, et al. "An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification." Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023.
The text was updated successfully, but these errors were encountered: