Skip to content

zhiqu22/mitre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

MITRE

Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation

Introduction

MITRE (Multilingual Translation with Registers) is a multilingual, decoder-only model designed for many-to-many translation tasks.
The technology, i.e., registering, is introduced in our paper.

This is the repository for reproducing the data mining and pre-training described in our paper.

Note:
Given that partial works are done during Zhi Qu's internship at ASTREC of NICT, Japan, the codes in this repository are under the open-source procedure of NICT.

Once the procedure is finished, we will release all codes immediately.

However!!! You can move to our HuggingFace pages, MITRE_466M and MITRE_913M, where we have already released another version of our codes and pre-trained models with the exactly same performance.

Languages covered

Germanic: English (en), German (de), Dutch; Flemish (nl), Swedish (sv), Danish (da), Afrikaans (af)
Romance: French (fr), Spanish (es), Italian (it), Portuguese (pt), Romanian; Moldavian; Moldovan (ro)
Slavic: Russian (ru), Czech (cs), Polish (pl), Bulgarian (bg), Ukrainian (uk)
Malayo-Polynesian: Indonesian (id), Malay (ms), Javanese (jv), Tagalog;Filipino (tl)
Asian*: Chinese (zh), Japanese (ja), Korean (ko), Vietnamese (vi)

BibTeX entry and citation info

@misc{qu2025registeringsourcetokenstarget,
      title={Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation}, 
      author={Zhi Qu and Yiran Wang and Jiannan Mao and Chenchen Ding and Hideki Tanaka and Masao Utiyama and Taro Watanabe},
      year={2025},
      eprint={2501.02979},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.02979}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published