Arabic Treebank (ATB) which are a collection of Arabic news stories built as part of of the DARPA TIDES project:
- Part 1 v 4.1 (LDC2010T13)
- Part 2 v 3.1 (LDC2011T09)
- Part 3 v 3.2 (LDC2010T08)
- Broadcast News v 1.0 (LDC2012T07)
In order to study misinformation/disinformation in Arabic news, we develop, AraNews, a large-scale, multi-topic, and multi-country Arabic news dataset. To create the dataset, we start by manually creating a list of 50 newspapers belonging to 15 Arab countries, the United States of America (USA), and the United Kingdom (UK). Then, we scrape the news articles from this list of newspapers. Ultimately, we collected a total of 5,187,957 news articles.
You can donwload the Khouja's dataset from Github
Coming soon
@inproceedings{nagoudi-2020-fake,
title ={{Machine Generation and Detection of Arabic Manipulated and Fake News}},
author = {Nagoudi, El Moatez Billah and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Alhindi, Tariq and Cavusoglu, Hasan},
booktitle ={{P}roceedings of the {F}ourth {A}rabic {N}atural {L}anguage {P}rocessing {W}orkshop}},
year = {2020},
address = {Barcelona, Spain}
}