Skip to content

Machine Generation and Detection of Arabic Manipulated and Fake News

Notifications You must be signed in to change notification settings

UBC-NLP/wanlp2020_arabic_fake_news_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Machine Generation and Detection of Arabic Manipulated and Fake News

Fake news and deceptive machine-generated text comprise a serious problem threatening modern societies, including in the Arab world. This motivates work on detecting false and manipulated stories online. However, a bottleneck for this research is lack of sufficient data to train detection models. In this work, we present a simple method for automatically generating Arabic manipulated and fake news stories. Our method is simple, and only depends on availability of legitimate stories, which are abundant online, and a part of speech tagger (POS). To facilitate future work, we dispense with both of these requirements altogether by providing AraNews, a novel and large POS-tagged news dataset that can be used off-the-shelf. Using stories generated based on AraNews, we carry out a human annotation study that casts light on the effects of machine manipulation on text veracity. The study also measures human ability to detect Arabic machine manipulated text generated by our method. Finally, we develop the first Arabic news manipulation detection models and a new SOTA model for detecting Arabic fake news.

Datasets

ATB: Arabic TreeBank

Arabic Treebank (ATB) which are a collection of Arabic news stories built as part of of the DARPA TIDES project:

AraNews: A New Large-Scale Arabic News Dataset

In order to study misinformation/disinformation in Arabic news, we develop, AraNews, a large-scale, multi-topic, and multi-country Arabic news dataset. To create the dataset, we start by manually creating a list of 50 newspapers belonging to 15 Arab countries, the United States of America (USA), and the United Kingdom (UK). Then, we scrape the news articles from this list of newspapers. Ultimately, we collected a total of 5,187,957 news articles.

Donwload AraNews:

ANS: Arabic News Stance Corpus

You can donwload the Khouja's dataset from Github

Models

Coming soon

Cite Us

@inproceedings{nagoudi-2020-fake,
    title ={{Machine Generation and Detection of Arabic Manipulated and Fake News}},
    author = {Nagoudi, El Moatez Billah and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Alhindi, Tariq and Cavusoglu, Hasan},
    booktitle ={{P}roceedings of the {F}ourth {A}rabic {N}atural {L}anguage {P}rocessing {W}orkshop}},
    year = {2020},
    address = {Barcelona, Spain}
}

About

Machine Generation and Detection of Arabic Manipulated and Fake News

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published