This is the official repository for Spoken Stereoset, a dataset measures stereotypical bias on speech large language models (SLLMs). The construction detail can be found in our paper soon.
id: The unique id of instance.
speaker: The speaker of the speech segment in azure TTS.
age/gender: The demogrpahic attribute of the speaker that might link to stereotypical associations.
context: The transcription of spoken context sentence.
irrelevant: An irrelevant continuation to the context.
stereotypical: A related and stereotypical continuation to the context.
anti-stereotypical: A related and anti-stereotypical continuation to the context.
labels: The labels annotated by the annotators for 3 possinle continuations.
annotators: The annotator id of the annotations.\
If you have any concerns, please contact: even.dlion8@gmail.com