Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] expand allowed NER labels to be any I-O-B tagged labels #87091

Merged

Conversation

benwtrent
Copy link
Member

Named entity recognition (NER) is a special form of token classification. The specific kind of labelling we support is Inside-Outside-Beginning (IOB) tagging. These labels indicate if they are the inside of a token (with a I- or I_), the beginning (B- or B_) or outside (O).

Each valid token classification label starts with the require prefix or O.

Before this commit, we restricted the labels to a specific set:

O(Entity.NONE),      // Outside a named entity
B_MISC(Entity.MISC), // Beginning of a miscellaneous entity right after another miscellaneous entity
I_MISC(Entity.MISC), // Miscellaneous entity
B_PER(Entity.PER),   // Beginning of a person's name right after another person's name
I_PER(Entity.PER),   // Person's name
B_ORG(Entity.ORG),   // Beginning of an organization right after another organization
I_ORG(Entity.ORG),   // Organisation
B_LOC(Entity.LOC),   // Beginning of a location right after another location
I_LOC(Entity.LOC);   // Location

But now, any entity is allowed, as long as the naming of the labels adhere to IOB tagging rules.

Here is an inference response containing other token labels:

{
    "predicted_value": "[Birth defects](ADR&Birth+defects) associated with [thalidomide](DRUG&thalidomide).",
    "entities": [
        {
            "entity": "birth defects",
            "class_name": "ADR",
            "class_probability": 0.9664951378636988,
            "start_pos": 0,
            "end_pos": 13
        },
        {
            "entity": "thalidomide",
            "class_name": "DRUG",
            "class_probability": 0.7323781805751934,
            "start_pos": 30,
            "end_pos": 41
        }
    ]
}

@benwtrent benwtrent requested a review from davidkyle May 24, 2022 20:35
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label May 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a question on whether we can remove the Entity enum entirely now.

Also, could we add a test for labels that are custom and do not match the default labels?

@benwtrent
Copy link
Member Author

@elasticmachine update branch

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit 90d93a9 into elastic:master May 25, 2022
@benwtrent benwtrent deleted the feature/ml-expand-ner-tokens-allowed branch May 25, 2022 18:18
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request May 26, 2022
…#87091)

Named entity recognition (NER) is a special form of token classification. The specific kind of labelling we support is Inside-Outside-Beginning (IOB) tagging. These labels indicate if they are the inside of a token (with a `I-` or `I_`), the beginning (`B-` or `B_`) or outside (`O`). 

Each valid token classification label starts with the require prefix or `O`. 

Before this commit, we restricted the labels to a specific set:

```
O(Entity.NONE),      // Outside a named entity
B_MISC(Entity.MISC), // Beginning of a miscellaneous entity right after another miscellaneous entity
I_MISC(Entity.MISC), // Miscellaneous entity
B_PER(Entity.PER),   // Beginning of a person's name right after another person's name
I_PER(Entity.PER),   // Person's name
B_ORG(Entity.ORG),   // Beginning of an organization right after another organization
I_ORG(Entity.ORG),   // Organisation
B_LOC(Entity.LOC),   // Beginning of a location right after another location
I_LOC(Entity.LOC);   // Location
```

But now, any entity is allowed, as long as the naming of the labels adhere to IOB tagging rules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants