-
Download data
Download files from these links and copy them to the data directory
Energy Hub
Energy Hub Training set - Energy Hub Validation set - Energy Hub Test set -
Reuters
Reuters Training set - Reuters Validation set - Retuers Test set -
-
Downloading Necessary Packages
- Download NLTK stopwords using
import nltk nltk.download('stopwords')
- Download Mallet from here. Unzip and copy it to the directory.
If you use Google Colab:
!wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip !unzip mallet-2.0.8.zip
-
Download GloVe embeddings from here. Unzip and copy it to the directory.
If you use Google Colab:
``` !wget https://nlp.stanford.edu/data/wordvecs/glove.6B.zip !unzip glove*.zip ```
- Download NLTK stopwords using
-
Build Topic-Entity Triples
This step involves
- Training a Topic Modeler over the corpus
- Extracting Named-Entities using spaCy
- Building Triples using Dependency parser and POS tagger
- Apply Topic Entity Filter over these triples
Run the following python file.
python data_preprocess.py <dataset>
Change
<dataset>
to "energy hub" or "reuters" to select the corpus. -
Training Models
Run the following python file.
python train.py <dataset> <model>
Change
<dataset>
to "energy hub" or "reuters" to select the corpus.Change
<model>
to the following options- text - for GloVe based text model
- topics - To use topic distributions
- entites - To use Glove-enriched named entities
- triples - To use Glove-enriched triples
- text_topics - To use text and topic distributions
- text_triples - To use text(GloVe) and triples(GloVe)
-
Notifications
You must be signed in to change notification settings - Fork 0
dineshnagumothu/quokka
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published