GitHub - vickyzayats/switchboard_corrected_reannotated: Automatic Mapping of Disfluency Annotations for corrected version of Switchboard

Switchboard reannotated dataset

We provide a new version of Switchboard corpus with disfluency annotations for careful speech transcripts.

The columns in the data correspond to:
sentence - list of words for each sentence in Penn Treebank
ms_sentence - list of words for each sentence in Ms-State transcript
comb_sentence - combination of the two versions of the sentence
names - word ids for sentence
ms_names - word ids for ms_sentence
comb_ann - tags for comb_sentence that indicate which words have to be inserted/deleted/substituted in order to get from MsState to Treebank
tags - BIO tags for sentence (Penn Treebank annotation)
ms_disfl - BIO tags for MsState sentence (silver annotation)

BIO tags are the following:
BE - beginning of the reparandum
IE - inside the reparandum
IP - the last word before the interruption point
BE_IP - single token reparandum
C - repair (correction)
O - non-disfluency
C_IE - the word is both in the reparandum and repair but not before interruption point (in nested disfluencies)
C_IP - the word is both in the reparandum and repair and the last before the interruption point (in nested disfluencies)

Paper

You can find more details in our paper: https://arxiv.org/pdf/1904.04398.pdf.

@article{zayats2019disfluencies,
  title={Disfluencies and Human Speech Transcription Errors},
  author={Zayats, Vicky and Tran, Trang and Wright, Richard and Mansfield, Courtney and Ostendorf, Mari},
  journal={Interspeech},
  year={2019}
}

License

This dataset is an extension of the Switchboard and distributed under LDC License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
switchboard_corrected_with_silver_reannotation.zip		switchboard_corrected_with_silver_reannotation.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Switchboard reannotated dataset

Paper

License

About

Releases

Packages

vickyzayats/switchboard_corrected_reannotated

Folders and files

Latest commit

History

Repository files navigation

Switchboard reannotated dataset

Paper

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages