Welcome to `CreoleVal`

Overview

This repository includes data (or otherwise download scripts), scripts for training and evaluation, and models for tasks spanning natural language understanding and generation for Creole languages.

Statistics about the coverage of CreoleVal can be found here, as well as additional analysis of the performance and behaviour over the included tasks.

This repo is under construction!

This repository is actively undergoing construction, on a weekly or even daily basis. Our outstanding TODO items include:

A "Getting Started" guide, to walk you through the data and experiments in this repo.
Adding more scripts, so others can easily run CreoleVal experiments
[nlg/] Add links and experiments for KriolMorisiyen MT
[Appendix/] Adding more documentation, with analysis of experiments
Generally, add the scripts to make it clear what data is left over to train CreoleLM's with, without cross-contaminating
Make sure there are no hard-coded paths

Natural Language Understanding (`/nlu`)

Machine comprehension, relation classification, UDPoS, NER, NLI, sentiment analysis, and tatoeba challenge.

Natural Language Generation (`/nlg`)

Machine translation with bibles, the MIT-Haiti Corpus, and KriolMorisiyenMT

License Overview

Because CreoleVal is a compossit of new benchmarks and pre-existing ones, there are several different software licesnes at play. For the datasets packed within CreoleVal (i.e., the data is actually in the repo, rather than fetched with a download script), we summarize them here, for your convenience. Note: an * indicates a dataset that we have newly introduced in CreoleVal:

Dataset	Task	Languages	Source	Domain	License
MCTest	machine comprehension	eng, hat, mfe	original	short stories for kids	MSR-LA: Microsoft Research License
CreoleRC	relation classification	bi, cbk-zam, jam, phi, tpi*	Wikipedia	Wikipedia	CC-BY-SA 4.0
MIT-Haiti Corpus	machine translation	hat, eng, es, fr	Platform MIT-Haiti	education	CC-BY-SA 4.0
WikiAnn	named entity recognition	bi, cbk-zam, ht, pih, sg, tpi, pap*	WikiAnn	Wikipedia	CC-BY-SA 4.0

Citation

Paper can be found here.

Please cite us:

@misc{lent2023creoleval,
      title={CreoleVal: Multilingual Multitask Benchmarks for Creoles}, 
      author={Heather Lent and Kushal Tatariya and Raj Dabre and Yiyi Chen and Marcell Fekete and Esther Ploeger and Li Zhou and Hans Erik Heje and Diptesh Kanojia and Paul Belony and Marcel Bollmann and Loïc Grobol and Miryam de Lhoneux and Daniel Hershcovich and Michel DeGraff and Anders Søgaard and Johannes Bjerva},
      year={2023},
      eprint={2310.19567},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 293 Commits
Appendix		Appendix
Models		Models
nlg		nlg
nlu		nlu
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to `CreoleVal`

Overview

This repo is under construction!

Natural Language Understanding (`/nlu`)

Natural Language Generation (`/nlg`)

License Overview

Citation

About

Releases

Packages

Languages

bjclayton/CreoleVal

Folders and files

Latest commit

History

Repository files navigation

Welcome to CreoleVal

Overview

This repo is under construction!

Natural Language Understanding (/nlu)

Natural Language Generation (/nlg)

License Overview

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Welcome to `CreoleVal`

Natural Language Understanding (`/nlu`)

Natural Language Generation (`/nlg`)

Packages