Releases: VUB-HYDR/Wikimpacts
Releases · VUB-HYDR/Wikimpacts
v.1.0.1
What's Changed
v1.0.0 (raw)
A raw pre-release of the database. The database contains data after parsing LLM output but before applying: (a) data gap filling (see #173 and #101), and (2) currency conversion and inflation adjustment to USD (2024) (see #111).
v1.0.1 (raw)
A raw pre-release version fixing the following bugs in v1.0:
- filter out LLM output from articles that are not a climate extreme; the LLM usually generates rows of NULLs or recites the prompt example (by generating lists like
[country1, country2]
because it's unable to retrieve any location data from irrelevant articles). No cases of hallucinating country names have been found. See #184. - handle a corner case where
NULL
values not evaluated asNone
resulted in incorrect locations such as "Null Dam". See #183. - handle datatype bug in GID column. See #179.
- validate for currencies (see #186) and Hazard-Main Event relation (see #181).
More details can be found in #173.
Wikimpacts 1.0 database
Using LLMs to Build a Database of Climate Extreme Impacts
BibTeX Citation
If you use code in this release in a scientific publication, you can cite the publication as follows:
@inproceedings{li-etal-2024-using-llms,
title = "Using {LLM}s to Build a Database of Climate Extreme Impacts",
author = {Li, Ni and
Zahra, Shorouq and
Brito, Mariana and
Flynn, Clare and
G{\"o}rnerup, Olof and
Worou, Koffi and
Kurfali, Murathan and
Meng, Chanjuan and
Thiery, Wim and
Zscheischler, Jakob and
Messori, Gabriele and
Nivre, Joakim},
editor = "Stammbach, Dominik and
Ni, Jingwei and
Schimanski, Tobias and
Dutia, Kalyan and
Singh, Alok and
Bingler, Julia and
Christiaen, Christophe and
Kushwaha, Neetu and
Muccione, Veruska and
A. Vaghefi, Saeid and
Leippold, Markus",
booktitle = "Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.climatenlp-1.7",
doi = "10.18653/v1/2024.climatenlp-1.7",
pages = "93--110",
abstract = "To better understand how extreme climate events impact society, we need to increase the availability of accurate and comprehensive information about these impacts. We propose a method for building large-scale databases of climate extreme impacts from online textual sources, using LLMs for information extraction in combination with more traditional NLP techniques to improve accuracy and consistency. We evaluate the method against a small benchmark database created by human experts and find that extraction accuracy varies for different types of information. We compare three different LLMs and find that, while the commercial GPT-4 model gives the best performance overall, the open-source models Mistral and Mixtral are competitive for some types of information.",
}