This project aims to explore various techniques for keyword extraction using Natural Language Processing (NLP) and text mining methods. Keyword extraction is a crucial task in NLP, helping to identify the most relevant words or phrases in a document, which can enhance information retrieval and text summarization.
This Jupyter Notebook (scake_notebook.ipynb
) contains all the code and programs from the sCAKE pipeline for keyword extraction. The pipeline consists of a series of Python scripts that process documents stored in a specified folder. You can also run the complete pipeline using the Jupyter Notebook, which integrates all the above steps for convenience.
-
Ensure you have all required dependencies installed.
-
Place your documents in the
data
folder. -
Open the
scake_notebook.ipynb
file in Jupyter Notebook. -
Execute each cell sequentially to run the pipeline.
For insights into the current methodologies and findings in keyword extraction, you can refer to the following research article: sCAKE: Semantic Connectivity Aware Keyword Extraction.
The original work is the intellectual property of the authors, Swagata Duari and Vasudha Bhatnagar. We are exploring this work solely for educational purposes and to enhance our understanding of keyword extraction techniques.