Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.
/ DaDoEval Public archive

Data and code for my solution to the Evalita 2020 shared task DaDoEval – Dating Document Evaluation.

Notifications You must be signed in to change notification settings

matteobrv/DaDoEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DaDoEval at Evalita 2020 - An SVM-based approach for Automatic Document Dating

Temporal information, such as the publication date of a document, is of major relevance in a number of domains. Building on this idea, the DaDoEval – Dating Document Evaluation – shared task, hosted at Evalita 2020, invites participants to tackle a series of automatic document dating sub-tasks, working with documents related to Italian statesman Alcide De Gasperi.

This repository stores the data and the implementation for my solution to the first two sub-tasks: Coarse-grained classification on same-genre data and Coarse-grained classification on cross-genre data. Both sub-tasks require to correctly assign document samples to one out of five historical periods identified in De Gasperi’s political life, spanning a range of over fifty years from 1901 to 1954.

Habsburg years Beginning of political activity Internal exile From fascism to the Italian Republic Building the Italian Republic
1901-1918 1919-1926 1927-1942 1943-1947 1948-1954

The solution is based on a linear multi-class Support Vector Machine classifier trained on a combination of character and word n-grams, as well as number of word tokens per document.

A detailed description of the approach is outlined in my system description paper, while the code is available in the DaDoEval_2020 notebook.

Despite its simplicity, the system ranked first in both sub-tasks, achieving a macro-average F1 score of 0.934 and 0.413, respectively.

Requirements

python 3.6
numpy 1.19.2
matplotlib 3.3.2
scikit-learn 0.23.2
scikit-optimize 0.8.1

About

Data and code for my solution to the Evalita 2020 shared task DaDoEval – Dating Document Evaluation.

Topics

Resources

Stars

Watchers

Forks