K-means-Clustering-on-Text-Documents

Using Scikit-learn, machine learning library for the Python programming language.

Note:

Each row in excel sheet corresponds to a document.
Data needs to be in excel format for this code, if you have a csv file then you can use pd.read_csv('file name') instead of pd.read_excel(''). If you don't have any data then just use the dummy corpus given in the code.
Clustering is an unsupervised learning technique, which means by using this code you will cluster the set of documents on the basis of some similarity they possess.
Note that in 'data.xlsx' the Idea column have the corresponding label/Topic as NA. By applying K-Means you can group similar doc and then label later by applying topic modelling on the groups you have just found out.

I will be updating topic modelling later.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
data.xlsx		data.xlsx
doc_clustering.py		doc_clustering.py

Provide feedback