Skip to content

[Findings of EMNLP 2024] NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

License

Notifications You must be signed in to change notification settings

Fsoft-AIC/NeuroMax

Repository files navigation

Code for NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization (EMNLP 2024 Findings)

Paper link

Preparing libraries

  1. Install the following libraries
    numpy 1.26.4
    torch_kmeans 0.2.0
    pytorch 2.2.0
    sentence_transformers 2.2.2
    scipy 1.10
    bertopic 0.16.0
    gensim 4.2.0
    
  2. Install java
  3. Download this java jar to ./evaluations/pametto.jar for evaluating
  4. Download and extract this processed Wikipedia corpus to ./datasets/wikipedia/ as an external reference corpus.

Usage

To run and evaluate our model for YahooAnswers dataset, run this example:

python main.py --use_pretrainWE

Acknowledgement

Some part of this implementation is based on TopMost. We also utilizes Palmetto for the evaluation of topic coherence.

Citation

If you want to reuse our code, please cite us as:

@misc{pham2024neuromax,
      title={NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization}, 
      author={Duy-Tung Pham and Thien Trang Nguyen Vu and Tung Nguyen and Linh Ngo Van and Duc Anh Nguyen and Thien Huu Nguyen},
      year={2024},
      eprint={2409.19749},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.19749}, 
}

About

[Findings of EMNLP 2024] NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published