[AIR] Add infrequent_categories_
attribute to OneHotEncoder
and MultiHotEncoder
#27357
Labels
enhancement
Request for new feature and/or capability
stale
The issue is stale. It will be closed within 7 days unless there are further conversation
Description
Title.
See attributes section of https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html.
Use case
If you specify
max_categories
, it's not clear how you figure out which categories are dropped.For example, suppose you have a column
language
with stringsAnd you one-hot encode the dataset.
It's not obvious which categories get dropped. One approach is to look at the column names
But it'd be nice if you could directly get which categories are dropped
The text was updated successfully, but these errors were encountered: