-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] expand allowed NER labels to be any I-O-B tagged labels #87091
[ML] expand allowed NER labels to be any I-O-B tagged labels #87091
Conversation
Pinging @elastic/ml-core (Team:ML) |
Hi @benwtrent, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a question on whether we can remove the Entity
enum entirely now.
Also, could we add a test for labels that are custom and do not match the default labels?
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/nlp/NerProcessor.java
Outdated
Show resolved
Hide resolved
…wtrent/elasticsearch into feature/ml-expand-ner-tokens-allowed
…-ner-tokens-allowed
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…#87091) Named entity recognition (NER) is a special form of token classification. The specific kind of labelling we support is Inside-Outside-Beginning (IOB) tagging. These labels indicate if they are the inside of a token (with a `I-` or `I_`), the beginning (`B-` or `B_`) or outside (`O`). Each valid token classification label starts with the require prefix or `O`. Before this commit, we restricted the labels to a specific set: ``` O(Entity.NONE), // Outside a named entity B_MISC(Entity.MISC), // Beginning of a miscellaneous entity right after another miscellaneous entity I_MISC(Entity.MISC), // Miscellaneous entity B_PER(Entity.PER), // Beginning of a person's name right after another person's name I_PER(Entity.PER), // Person's name B_ORG(Entity.ORG), // Beginning of an organization right after another organization I_ORG(Entity.ORG), // Organisation B_LOC(Entity.LOC), // Beginning of a location right after another location I_LOC(Entity.LOC); // Location ``` But now, any entity is allowed, as long as the naming of the labels adhere to IOB tagging rules.
Named entity recognition (NER) is a special form of token classification. The specific kind of labelling we support is Inside-Outside-Beginning (IOB) tagging. These labels indicate if they are the inside of a token (with a
I-
orI_
), the beginning (B-
orB_
) or outside (O
).Each valid token classification label starts with the require prefix or
O
.Before this commit, we restricted the labels to a specific set:
But now, any entity is allowed, as long as the naming of the labels adhere to IOB tagging rules.
Here is an inference response containing other token labels: