This repository is the implemented code based on PyTorch of our paper 《Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer》,our approach achieved improvements on two low resource languages (including Dutch and Spanish) and Chinese OntoNotes 4.0 dataset.
Python : 2.7
PyTorch : >=0.3.0
This code is based on PyTorch. You can find installation instructions here.
You can install dependencies like this :
pip install -r requirements.txt
The default configuration is in the file demo.train.config and demo.decode.config.You can modify the parameters as you want.
In training status: : CUDA_VISIBLE_DEVICES=0 python main.py --config demo.train.config
In decoding status : python main.py --config demo.decode.config
Language | Dataset | Link |
---|---|---|
Dutch | CoNLL-2002 | https://github.com/synalp/NER/tree/master/corpus/ |
Spanish | CoNLL-2002 | https://github.com/synalp/NER/tree/master/corpus/ |
Chinese | Ontonotes 4.0 | https://catalog.ldc.upenn.edu/ldc2011t03 |
Translation | Link |
---|---|
MUSE | https://github.com/facebookresearch/MUSE |
Word embedding | Link |
---|---|
Glove(english) | https://nlp.stanford.edu/projects/glove/ |