Code for NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization (EMNLP 2024 Findings)
- Install the following libraries
numpy 1.26.4 torch_kmeans 0.2.0 pytorch 2.2.0 sentence_transformers 2.2.2 scipy 1.10 bertopic 0.16.0 gensim 4.2.0
- Install java
- Download this java jar to ./evaluations/pametto.jar for evaluating
- Download and extract this processed Wikipedia corpus to ./datasets/wikipedia/ as an external reference corpus.
To run and evaluate our model for YahooAnswers dataset, run this example:
python main.py --use_pretrainWE
Some part of this implementation is based on TopMost. We also utilizes Palmetto for the evaluation of topic coherence.
If you want to reuse our code, please cite us as:
@misc{pham2024neuromax,
title={NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization},
author={Duy-Tung Pham and Thien Trang Nguyen Vu and Tung Nguyen and Linh Ngo Van and Duc Anh Nguyen and Thien Huu Nguyen},
year={2024},
eprint={2409.19749},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.19749},
}