Is there a need to integrate Byte-pair encodings (BPE)? #13

GeorgeS2019 · 2020-09-11T14:40:25Z

Byte-pair encodings (BPE) are now very commonly used in NLP.

Is there a plan in future to integrate BPE in Sep2SeqSharp?

If so, will that be a c# wrapper (e.g. swift wrapper) around e.g. FastBPE.

Would you consider a pure C# version of e.g. FastBPE [ link to pure python FastBPE ]?

This issue is more a feature proposal. Looking forwards to get some feedback

zhongkaifu · 2020-09-11T17:28:55Z

Thanks @GeorgeS2019 for your suggestion.

I know sub-word tokenization is really useful for text generation tasks, such as MT task could get 2~3pt BLEU scores gain on average and some NN frameworks did integrate sub-word tokenization, such as Marian uses built-in SentencePiece for data processing,

However, since it's a part of data processing, and include several key steps, such as model training, encoding and decoding, I prefer to create separate project for it rather than integrating it to Seq2SeqSharp project.

So, in my opinion, my plan would be 1) Create a project for BPE training/encoding/decoding called SubwordSharp. :) 2) Create a training pipeline to integrate SubwordSharp BPE model training, BPE encoding, and Seq2SeqSharp training, BPE decoding and evaluation steps together, and 3) Create a runtime pipeline to integrate BPE encoding, Seq2SeqSharp inference, BPE decoding together.

GeorgeS2019 mentioned this issue Sep 11, 2020

SciSharp proposals? SciSharp/TensorFlow.NET#618

Open

zhongkaifu closed this as completed May 11, 2021

GeorgeS2019 mentioned this issue May 18, 2021

Missing TorchText.NN feature dotnet/TorchSharp#257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a need to integrate Byte-pair encodings (BPE)? #13

Is there a need to integrate Byte-pair encodings (BPE)? #13

GeorgeS2019 commented Sep 11, 2020 •

edited

Loading

zhongkaifu commented Sep 11, 2020

Is there a need to integrate Byte-pair encodings (BPE)? #13

Is there a need to integrate Byte-pair encodings (BPE)? #13

Comments

GeorgeS2019 commented Sep 11, 2020 • edited Loading

zhongkaifu commented Sep 11, 2020

GeorgeS2019 commented Sep 11, 2020 •

edited

Loading