-
- contain several million and 10 million Chinese nlp corpus, related wiki2019zh, news2016zh, baike2019qa
- thunlp/THUOCL
- have filtered by manual work
- [enwiki8]
- [text8]
- [WikiText-103]
- [One Billion Word]
- [Penn TreeBank]
- Chinese Word Embedding
- Fasttext. Trained on Common Crawl and Wikipedia. 300 dims. Stanford word segmenter for Chinese.
- Useful. 2 million.
- Tencent AI Lab Embedding Corpus for Chinese Words and Phrases
- Embedding/Chinese-Word-Vectors
- 100+ Chinese Word Vectors
- []
- Fasttext. Trained on Common Crawl and Wikipedia. 300 dims. Stanford word segmenter for Chinese.
- English Word Embedding
- [NQ] Natural Questions: a Benchmark for Question Answering Research.** Tom Kwiatkowski and Michael Collins, Research Scientists, Google AI Language. January 23, 2019. paper; blog
- [SQuAD 2.0: Stanford Question Answering Dataset] Know What You Don’t Know: Unanswerable Questions for SQuAD. Pranav Rajpurkar∗ Robin Jia∗ Percy Liang. 2018 ACL.
- [SQuAD 1.0: Stanford Question Answering Dataset] SQuAD: 100,000+ Questions for Machine Comprehension of Text. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang. 2016 EMNLP.
- [HotpotQA]
- [NarrativeQA]
- [TriviaQA]
- [QuAC]
- [CoQA]
- [WikiQA]
- [MS Marco]
- [NewsQA]
- [CNN/DailyMail news] Teaching machines to read and comprehend. ACL 2015.
- [CBTest: Children’s Book Test] The goldilocks principle: Reading children’s books with explicit memory representations. Hill et al. 2015.
- [DuReader] DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications. Haifeng Wang et al. 2018. Leadboard
- [CMRC 2018] Chinese Machine Reading Comprehension. HIT & iFLYTEK 2018. link
- [bAbI] Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. Jason Weston 2015.
- [Who-Did-What] Who did What: A Large-Scale Person-Centered Cloze Dataset. 2016 EMNLP. Takeshi Onishi et al.
- [RACE] a reading comprehension task designed for middle and high-school English exams in China. Lai et al. 2017.
- [SNLI: Stanford Natural Language Inference] A large annotated corpus for learning natural language inference. Bowman et al. 2015. link
- [SciTail] A textual entailment dataset from science question answering. Khot et al. AAAI. 2018. link
- [QQP: Quora Question Pairs] Quora question pairs. Z. Chen, H. Zhang, X. Zhang, and L. Zhao. 2018.
- [MRPC: Microsoft Research Paraphrase Corpus] Automatically constructing a corpus of sentential paraphrases. William B Dolan and Chris Brockett. 2005.
- [MNLI: Multi-Genre Natural Language Inference] The RepEval 2017 Shared Task: MultiGenre Natural Language Inference with Sentence Representations N. Nangia, A. Williams, A. Lazaridou, and S. R. Bowman. 2017.
- [RTE: Recognizing Textual Entailment] Glue: A multi-task benchmark and analysis platform for natural language understanding. Alex Wang et al. 2018
- [WNLI: Winograd NLI] derived from The winograd schema challenge. Hector Levesque et al. 2012.
- LCQMC
- [STS-B: Semantic Textual Similarity Benchmark] Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. Daniel Cer et al. 2017.
- [QNLI] derived from the Stanford Question Answering Dataset.(SQuAD 1.0) 2016.
- [CoLA] Neural Network Acceptability Judgments. Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. 2018
- [SST-2: Stanford Sentiment Treebank] Recursive deep models for semantic compositionality over a sentiment treebank. Richard Socher et al. 2013 EMNLP.
-
Single Sentence. [a sentence can be an arbitrary span of contiguous text or word sequence, rather than a linguistically plausible sentence.]
- single-sentence classification
- CoLA (predict whether an English sentence is grammatically plausible.)
- SST-2 (determine whether the sentiment of a sentence extracted from movie reviews is positive or negative)
- sequence labeling
- NER:
- POS
- single-sentence classification
-
Pair Sentences.
- pairwise text classification
- RTE (predict whether the hypothesis is an entailment, or not entailment with respect to the premise.)
- MNLI (predict whether the hypothesis is an entailment, contradiction, or neutral with respect to the premise.)
- WNLI (select the referent of a pronoun from a list of choices in a given sentence which contains the pronoun.)
- QQP (predict whether two questions are semantically equivalent)
- MRPC (whether a sentence pair is semantically equivalent to the other in the pair)
- SNLI (widely used entailment dataset for NLI)
- SciTail (assessing whether a given premise entails a given hypothesis)
- text similarity scoring
- STS-B (Given a pair of sentences, the model predicts a real-value score indicating the semantic similarity of the two sentences)
- relevance ranking
- QNLI (The task involves assessing whether a sentence contains the correct answer to a given query)
- pairwise text classification
-
QA.
- extractive QA
- SQuAD 1.0 (extract answer from a context given a question)
- SQuAD 2.0 (predict whether have answer and extract answer from a context given a question)
- CNN/DailyMail News (to teach the machine to do cloze-style reading comprehensions)
- generate QA
- Conversational QA
- DREAM (a multiple-choice Dialogue-based REAding comprehension exaMination dataset). example
- Single-turn QA
- DuReader (summary answer from multiple documents according to question which type is in Entity/Description/YesNo or Fact/Opinion)
- Machine Reading Comprehension
- Conversational QA
- extractive QA
- jackalhan/qa_datasets_converter
- Dataset Converter for natural language processing tasks such QA(question-answering) Tasks: from one format to other one.
Corpus | Task | #Train | #Dev | #Test | #Label | Metrics | Category | Source |
---|---|---|---|---|---|---|---|---|
CoLA | Acceptablility | 8.5k | 1k | 1k | 2 | Matthews corr | Single-Sentence Classification(GLUE) | |
SST-2 | Sentiment | 67k | 872 | 1.8k | 2 | Accuracy | Single-Sentence Classification(GLUE) | movie reviews |
STS-B | Similarity | 7k | 1.5k | 1.4k | 1 | Pearson/Spearman corr | Text Similarity(GLUE) | multiple data resources |
QNLI | QA/NLI | 108k | 5.7k | 5.7k | 2 | Accuracy | Relevance Ranking(GLUE) | SQuAD 1.0 |
QQP | Paraphrase | 364k | 40k | 391k | 2 | Accuracy/F1 | Pairwise Text Classification(GLUE) | Quora |
MRPC | Paraphrase | 3.7k | 408 | 1.7k | 2 | Accuracy/F1 | Pairwise Text Classification(GLUE) | online news |
MNLI | NLI | 393k | 20k | 20k | 3 | Accuracy | Pairwise Text Classification(GLUE) | |
RTE | NLI | 2.5k | 276 | 3k | 2 | Accuracy | Pairwise Text Classification(GLUE) | |
WNLI | NLI | 634 | 71 | 146 | 2 | Accuracy | Pairwise Text Classification(GLUE) | |
SNLI | NLI | 549k | 9.8k | 9.8k | 3 | Accuracy | Pairwise Text Classification | captions of the Flickr30 corpus |
SciTail | NLI | 23.5k | 1.3k | 2.1k | 2 | Accuracy | Pairwise Text Classification | science questions & relevant web sentences |
SQuAD 1.0 | QA | 87.5k | 10.5k | 9.5k | Accuracy/F1 | Extractive QA | 546 wiki pages | |
SQuAD 2.0 | QA | 130.3k | 11.8k | 8.8k | Accuracy/F1 | Extractive QA | 348 wiki pages | |
NQ | QA | 307.3k | 7.8k | 7.8k | Extractive QA | Google Search Engine | ||
CMRC 2018 | Span-Extraction Reading Comprehension | 11.1k | 3.2k | 2.5k | span | EM/F1 | Extractive QA | Chinese wiki papes |
DuReader | Open-domain Question Answering | 271.5k | 10k | 20k | ROUGE-L and BLEU4 | Generative QA | Baidu Search & Baid Zhidao | |
CNN/DailyMail | news |
- Multi-Task Deep Neural Networks for Natural Language Understanding. Xiaodong Liu et al. 2019.