My submission to the Multilingual Euphemism Detection Shared Task for the 4th Workshop on Figurative Language Processing (FigLang 2024), co-located with NAACL 2024.
"In this shared task, participants are invited to develop approaches and models to disambiguate texts (in multiple languages) as either euphemistic or not. Participants are encouraged to develop multi-lingual and cross-lingual approaches."
Link: https://www.codabench.org/competitions/1959/#/pages-tab
-
data = training and test data
-
notebooks = code for all experiments and inference
*Chose Spanish notebooks at random as an example. Explanations of reasoning should be included in the code comments or in the paper submission itself. The multilingual experiment code wasn't included since the workflow is identical to the euph_es_2.ipynb (second experiment with DistilBERT Multilingual Cased base model), except the dataset was all the languages in one.
You can find the fine-tuning outputs and models for inference/reproducibility here:
https://huggingface.co/nhankins
Coming soon. Will post a link to the paper on google scholar or a similar website, as well.
Feel free to send me an email if you have questions or want to collaborate on a research project.