Collection of dataset and corresponding benchmark for Rasa NLU
Rasa NLU is a powerful and open-source natural language processing tool for intent classification and entity extraction in chatbots.
However, we found that there is no published public dataset and the corresponding benchmark. This makes it difficult to evaluate the performance of our own NLU system built by Rasa.
Therefore, this project aims to collect and organize datasets and baselines for Task-Oriented Dialogue, which will be in the data format required by Rasa NLU and you can directly use them in your Rasa NLU system.
All the datasets have been organized and archived in the data
directory
Following information is included for each dataset:
- Name
- Language
- Task
- Size(train/test)
- Intent/Entity Nums
- Link (Website Or Paper)
Name | Language | Task | Size(Train/Test) | Intent/Entity Nums | Link |
---|---|---|---|---|---|
ATIS | en | Airline Travel Information | 4978/893 | 26/129 | more detail |
Snips | en | 7 intents, including:AddToPlaylist, BookRestaurant... | 13802/699 | 7/72 | more detail |
AsuUbuntuCorpus | en | 5 intents, questions about Ubuntu | 127/35 | 5/3 | more detail |
Facebook Multilingual Task Oriented Dataset | en | 3 domains, includeing:alarm,weather,remainder | 30521/8621 | 12/25 | more detail |
SMP2019 | zh | 29 domains, including: app, email... | 2063/480 | 24/62 | more detail |
Check flow dataset | zh | 13 intents, some request and inform | 809/210 | 13/6 | more detail |
MSRA_NER | zh | 1 intent, includeing various kinds of news and 3 kinds of entities | 20864/4636 | 1/3 | more detail |
ToutiaoNews | zh | 7 intent, includeing 7 kinds of news | 325279/57409 | 7/0 | more detail |
Note:
- For the SMP2019 and CheckFlow dataset, the official does not divide the training set and test set, we have divided according to 8:2 by ourselves.
- For English dataset, we use official
pretrained_embeddings_spacy
andsupervised_embeddings
as baseline NLU pipeline. - For Chinese dataset, we use officially recommended Chinese pipeline
rasa_nlu_chi
as baseline NLU pipeline.
Dataset | NLU Pipeline | Intent Classification | Entity Extraction | ||||||
auc | p | r | f1 | auc | p | r | f1 | ||
ATIS(en) | pretrained_embeddings_spacy | 0.91 | 0.91 | 0.91 | 0.91 | 0.98 | 0.98 | 0.98 | 0.98 |
supervised_embeddings | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 | 0.98 | 0.98 | |
Snips(en) | pretrained_embeddings_spacy | 0.99 | 0.99 | 0.99 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 |
supervised_embeddings | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
AskUbuntuCorpus(en) | pretrained_embeddings_spacy | 0.89 | 0.89 | 0.89 | 0.89 | 0.95 | 0.95 | 0.95 | 0.95 |
supervised_embeddings | 0.86 | 0.86 | 0.86 | 0.86 | 0.95 | 0.95 | 0.95 | 0.95 | |
Facebook Multilingual Task Oriented Dataset(en) | pretrained_embeddings_spacy | 0.96 | 0.96 | 0.96 | 0.96 | 0.98 | 0.98 | 0.98 | 0.98 |
supervised_embeddings | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.98 | 0.98 | 0.98 | |
SMP2019(zh) | rasa_nlu_chi | 0.76 | 0.83 | 0.76 | 0.78 | 0.79 | 0.80 | 0.79 | 0.77 |
CheckFlow(zh) | rasa_nlu_chi | 0.95 | 0.95 | 0.95 | 0.94 | 1.00 | 1.00 | 1.00 | 1.00 |
MSRA_NER(zh) | rasa_nlu_chi | N/A | N/A | N/A | N/A | 0.98 | 0.98 | 0.98 | 0.98 |