Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ZamanidisAlexios committed May 14, 2020
1 parent 8473091 commit dd17026
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 3 deletions.
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,16 @@
# articles
# News articles classification and clustering

In this repository we perform text classification experiments using Support Vector
Machines(SVM), Random Forest, Naive Bayes and K-Nearest Neighbor Classifier. Also,
we perform text clustering using K-means Clusterer.

First of all, we create a data set from our documents. The input is 2225 documents
and the labels consists of Business, Entertainment, Politics, Sport and Tech.

To sum up, the whole procedure consists of:

1) Create a data set of all documents
2) Text pre-processing
3) Generate Word Clouds
4) Vectorization
5) Classification and Clustering
10 changes: 8 additions & 2 deletions news_articles_documents/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
Consists of 2225 documents from a news website corresponding to stories in five topical areas from 2004-2005.
Consists of 2225 documents from a news website corresponding to stories in five topical areas
from 2004-2005.

Natural Classes: 5 (business, entertainment, politics, sport, tech)
Natural Classes: 5
* Business
* Entertainment
* Politics
* Sport
* Tech

First line of each document is the title and the rest is the content of the article.

0 comments on commit dd17026

Please sign in to comment.