Skip to content

hkim-etri/ExplainMeetSum

Repository files navigation

ExplainMeetSum

This is code for the ACL 2023 paper ExplainMeetSum: A Dataset for Explainable Meeting Summarization Aligned with Human Intent.

Table of Contents

Prerequsite

git clone https://github.com/angdong/ExplainMeetSum.git
conda create -n explainmeetsum
conda activate explainmeetsum

Dataset (ExplainMeetSum)

ExplainMeetSum dataset is augmented version of QMSUM, by newly annotating evidence sentences that faithfully "explain" a summary.

Note: covid_1.json in QMSum is excluded.
$\to$ There are some problems found to be used as dataset.

Folder Structure

.
├── QMSum@
├── acl2018_abssumm@
├── SummaryEvidence
│   ├── test
│   ├── train
│   └── val
├── ExplainMeetSum
│   ├── test
│   ├── train
│   └── val
├── convert.py
├── sentence_split.py
└── utils.py

Build dataset

You can build dataset by executing python file.
Created dataset will be same as ExplainMeetSum/data/ExplainMeetSum.

git submodule update --remote
cd ExplainMeetSum
pip install nltk==3.6.2
python data/convert.py

# or you can specify your own path
python data/convert.py \
    --qmsum QMSUM_ROOT_DIRECTORY \
    --dialogue_act ACL2018_ABSSUMM_ROOT_DIRECTORY \
    --save_dir EXPLAIMEETSUM_DIRECTORY

Dataset Format

You can find format of each dataset in here.

TLDR;

ExplainMeetSum

ExplainMeetSum data is extended-version of QMSum dataset. To annotate evidence sentence by sentence, we had to split sentences as correctly as we can. Below are our methods how to split sentences.

  1. meeting transcripts
    • Committee: use nltk.sent_tokenize()
    • Academic(ICSI), Product(Ami): use dialogue act files in ACL2018_AbsSumm
  2. answers in query_list (i.e. summary)
    • use nltk.sent_tokenize() and merge sentences that splited wrongly (if you want to know, refer to sentence_split.py)
    • splited answers are already stored in data/SummaryEvidence
QMSum

ExplainMeetSum data should also contain meeting_transcripts which doesn't exist in data/SummaryEvidence.
So, you need original QMSum/ dataset.

ACL2018_AbsSumm

We splited meeting_transcripts of ICSI and Ami dataset in QMSum by using dialogue act files.
So, you need acl2018_abssumm/ for dialogue act files.

Model (Multi-DYLE)

Multi-DYLE model extensively generalizes DYLE to enable using a supervised extractor based on human-aligned extractive oracles.

The figure shows the overall architecture of the Multi-DYLE for the case of $M$ = 2.

Dependency

Install dependencies via:

conda create -n explainmeetsum python=3.9.6
conda activate explainmeetsum

# below are depends on your envs
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

pip install nltk==3.6.2 pyrouge==0.1.3 transformers==4.8.1 rouge==1.0.0 datasets==1.11.0

Training

First download the checkpoint of DYLE's best-generator.ckpt from Google Drive.
Place the ckpt file under MultiDyle/dyle/.

sh train_multi_dyle.sh

With one NVIDIA RTX A6000, spent about 2h per epoch.
You can see other results by editing config.py.

Evaluation

First download the checkpoints of Multi-DYLE from Google Drive.
Place the folder under MultiDyle/outputs/multidyle-best-model/

sh test_multi_dyle.sh

Results will be same as table below.
You can see other results by editing config.py.

Results

Model R-1 R-2 R-L
$\text{Multi-DYLE}(\mathsf{X^{ROG}_o}, \mathsf{X^{CES}_o})$ 37.55 12.43 32.76
  • ($\mathsf{X^{ROG}_o}, \mathsf{X^{CES}_o}$) : Train with sentence-level ROUGE-based and CES-based extractive oracles.

Acknowledgements

Dataset named ExplainMeetSum is extended-version of QMSum(https://github.com/Yale-LILY/QMSum) for "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization," which distributed under MIT License Copyright (c) 2021 Yale-LILY.

Model named Multi-DYLE is extended-version of DYLE(https://github.com/Yale-LILY/DYLE) for "DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization," which distributed under MIT License Copyright (c) 2021 Yale-LILY.

Citation

If you extend or use our work, please cite the paper.

@inproceedings{kim-etal-2023-explainmeetsum,
    title = "{E}xplain{M}eet{S}um: A Dataset for Explainable Meeting Summarization Aligned with Human Intent",
    author = "Kim, Hyun  and
      Cho, Minsoo  and
      Na, Seung-Hoon",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.731",
    doi = "10.18653/v1/2023.acl-long.731",
    pages = "13079--13098",
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published