Skip to content

Data related to studying German discourse particle "ja" in translation equivalents

License

Notifications You must be signed in to change notification settings

cogstates/kojak

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

KoJaK

Data related to studying German discourse particle "ja" in translation equivalents.

Citation

@inproceedings{soubki-etal-2022-kojak,
    title = "{KOJAK}: A New Corpus for Studying {G}erman Discourse Particle ja",
    author = "Soubki, Adil  and
      Rambow, Owen  and
      Kang, Chong",
    booktitle = "Proceedings of the 3rd Workshop on Computational Approaches to Discourse",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea and Online",
    publisher = "International Conference on Computational Linguistics",
    url = "https://aclanthology.org/2022.codi-1.1",
    pages = "1--6",
    abstract = "In German, ja can be used as a discourse particle to indicate that a proposition, according to the speaker, is believed by both the speaker and audience. We use this observation to create KoJaK, a distantly-labeled English dataset derived from Europarl for studying when a speaker believes a statement to be common ground. This corpus is then analyzed to identify lexical choices in English that correspond with German ja. Finally, we perform experiments on the dataset to predict if an English clause corresponds to a German clause containing ja and achieve an F-measure of 75.3{\%} on a balanced test corpus.",
}

About

Data related to studying German discourse particle "ja" in translation equivalents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published