Skip to content

Latest commit

 

History

History
81 lines (61 loc) · 3.55 KB

7_Name_Entity_Recognition.md

File metadata and controls

81 lines (61 loc) · 3.55 KB

Basic Concepts

  • sequence tagging problem

Common Methods

  • bi-LSTM + CRF
    • Bidirectional LSTM-CRF Models for Sequence Tagging. 2015. Kai Yu et al.
  • bert-base
  • RoBERTa-wwm-large-ext

Concepts

  • BIO encoding.

  • BIOES format.

Systems

  • for English

    • spaCy
  • for Chinese

    • Stanza, all neural. Bi-LSTM + CRF. 非常好的工具. 18 types OntoNotes Release 5.0.
    • coreNLP
  • 工业界

应用

  • Tag理解与推荐
  • 语义联想
  • 深度语义表达

Dataset

Important Papers

Difficults

  • Hard to work out boundaries of entity
    • First National Bank Donates 2 Vans To Future School Of Fort Smith
    • Is the first entity “First National Bank” or “National Bank”
  • Hard to know if something is an entity
    • Is there a school called “Future School” or is it a future school?
  • Hard to know class of unknown/novel entity:
    • What class is “Zig Ziglar”? (A person.)
  • Entity class is ambiguous and depends on context
    • “Charles Schwab” is PER
 not ORG here!

Types

  • OntoNotes