This project focuses on extracting and analyzing named entities from text documents, specifically Word files, using various Natural Language Processing (NLP) techniques and libraries such as NLTK, spaCy, and TextBlob. The project includes features like suggesting titles based on named entities, extracting named entities, performing aspect-based sentiment analysis, and visualizing knowledge graphs and sentiment distributions.
Blog - https://medium.com/@sanskrutikhedkar09/mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453
Entity Extraction and Title Suggestion: Extracts named entities (like persons, organizations, and places) from the text and suggests a title based on these entities. Aspect-Based Sentiment Analysis (ABSA): Performs sentiment analysis on the extracted entities to understand the context and sentiment associated with each aspect mentioned in the text. Knowledge Graph Visualization: Creates and visualizes a knowledge graph based on the relationships identified between entities in the text. Sentiment Distribution Visualization: Visualizes the distribution of aspects and their associated sentiments extracted from the text. Support for Word Documents: Allows users to input either raw text or upload Word documents (.docx) for analysis.
Before running the project, ensure you have Python installed on your system. Then, install the required libraries using pip:
pip install nltk spacy docx2txt pandas matplotlib networkx textblob owlready2
You'll also need to download some NLTK and spaCy resources. Run the following in your Python environment:
import nltk
import spacy
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
spacy.cli.download("en_core_web_sm")
To use the project, run the ML_CP_final.ipynb notebook in a Jupyter environment like JupyterLab or Google Colab. Follow the prompts to input your text or upload a Word file for analysis.
Here's a quick example of how to suggest a title based on extracted entities from a given text:
text = "Google, based in Mountain View, announced a new breakthrough in artificial intelligence."
title = suggest_title(text)
print(f"Suggested Title: {title}")
Output:
Suggested Title: Analysis of Google, Mountain View
Contributions to the project are welcome! Please fork the repository and submit a pull request with your changes or improvements.
License: This project is licensed under the MIT License - see the LICENSE.md file for details.