Extracting metaphors and analogies from free text requires high-level reasoning abilities such as abstraction and language understanding. Our study focuses on the extraction of the concepts that form metaphoric analogies in literary texts. To this end, we construct a novel dataset in this domain with the help of domain experts. We compare the out-of-the-box ability of recent large language models (LLMs) to structure metaphoric mappings from fragments of texts containing proportional analogies. The models are further evaluated on the generation of implicit elements of the analogy, which are indirectly suggested in the texts and inferred by human readers. The competitive results obtained by LLMs in our experiments are encouraging and open up new avenues such as automatically extracting analogies and metaphors from text instead of investing resources in domain experts to manually label data.
Joanne Boisson, Zara Siddique, Hsuvas Borkakoty, Dimosthenis Antypas, Luis Espinosa-Anke, Jose Camacho-Collados. 2025. Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation. Proceedings of the 31th international conference on computational linguistics (COLING), Abu Dhabi, UAE. Association for Computational Linguistics.
We release a dataset of 204 short texts containing a 4-term analogy, where explicit terms are tagged and implicit terms are suggested by annotators.
The data and inter-annotator-agreement test can be found here.
We test the ability of Large Language Models to extract explicit terms of a metaphoric analogy, and generate relevant implicit terms.
Input :
- a short text
- a term T1, T2, S1 or S2
Instructions:
- Extract the other explicit terms forming the 4-terms metaphor
- Generate eventual missing implicit terms
Output : The structured metaphoric analogy : values of the 4 frames T1, T2, S1 and S2.
- Prompt used in the experiments:
The scripts and output of our experiments with openAI and open source models can be found here.
- Explicit terms : we evaluate the correctness of the extracted explicit terms using an automatic metric of lemmatized head noun match between the gold standard and the model output.
- Implicit terms: the relevance of the generated implicit terms is manually rated.