Parameter of MulticlassStatScores and MultilabelStatScores to control which classes/labels to include the averages #1723
Labels
enhancement
New feature or request
Milestone
🚀 Feature
Add a parameter to
MulticlassStatScores
andMultilabelStatScores
to control which classes/labels to include in averaging.Motivation
Sklearn's
precision_recall_fscore_support
allows users to define the labels used in averaging the computed metrics (as well as the order if the metrics are not averaged). This allows calculating "a multiclass average ignoring a majority negative class". E.g. in my use-case (sequence tagging), I do want to consider datapoints which have an out-tag ("O", meaning they are not tagged), as they might contribute to the false positives of other classes. Hence,ignore_index
is not sufficient, as the datapoints would be completely excluded.Pitch
Add a parameter
classes
toMulticlassStatScores
andlabels
toMultilabelStatScores
to limit the calculation of true positives, fp, fn, and tn to these classes/labels. The resulting averages (e.g. f1-score, accuracy) would then be an average only of the selected classes/labels.If the
classes
/labels
parameter is specified,num_classes
/num_labels
would not need to be set (or if they are set and do not agree with the passed number of classes/labels, an Exception would need to be raised).Alternatives
Currently, I am using a very hacky solution:
I am not happy with this solution for two reasons:
average
is set to"micro"
.Ideally, the stats would be reduced to the selected classes already in
_multiclass_stat_scores_update
/_multilabel_stat_scores_update
Additional context
I had already opened a discussion. However, I believe this cannot be solved without a new feature.
The text was updated successfully, but these errors were encountered: