In this work we present a Word Sense Disambiguation (WSD) engine that integrates a Transformer-based neural architecture with knowledge present in WordNet, the resource from which the sense inventory is taken from.
The architecture is composed of contextualized embeddings plus a Transformer on top with a final dense layer.
The models available include a base RoBERTa embeddings and are:
rdense
with only a two dense layer encoder.rtransform
with a Transformer encoder.wsddense
with a two dense layer encoder + an advanced lemma prediction net.wsdnetx
same as above but with a Transformer encoder.
The advanced net can be represented as:
where h is the final hidden state of the encoder. The |S|x|V| matrix is build like in the following:
As a training dataset we use both SemCor and WordNet Gloss Corpus.
git clone http://github.com/spallas/wsd.git
cd wsd/ || return 1
tmux new -s train
python -c "import torch; print(torch.__version__)"
source setup.sh
Unzip in the res/
folder the pre-processed training and test data that you can download here. Also unzip in res
dictinaries data that you can download here
Please refer to the wiki page in this repository for further details about the implementation.
# Download roberta.large model
cd res/
wget https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz
tar -xzvf roberta.large.tar.gz
# Load the model in fairseq
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('res/roberta.large', checkpoint_file='model.pt')
roberta.eval() # disable dropout (or leave in train mode to finetune)