Skip to content
Andrea Gazzarini edited this page Jun 29, 2018 · 5 revisions

The following picture illustrates the RRE Domain Model, which as you can see has been organized into a composite/tree-like structure where the relationship between each entity is always 1 to many.

domain_model

Apart the top level entity, which represents an evaluation instance and it acts just as a container, the other entities are:

Corpus

An evaluation process can involve more than one dataset, targeting a given search platform. Within the RRE context the following terms are considered synonyms: corpus, dataset, test collection.
Each corpus must be located under the corpora configuration folder and it is then referenced in one or more ratings file.
The internal format depends on the target search platform (see What We Need To Provide for details about the format).

Topic

Within a corpus, we can have one or more topics which map the user information need we want to satisfy with the search system. This is a logical, business-level entity which usually doesn't correspond to what we know as a "query".

Query Group

Instead of modelling the query level as a direct child of topics, RRE provides a further abstraction layer called "Query Group", which basically is a group of queries that are supposed to produce the same results. So here we can group a source query with several variants, for testing things like lowercasing, diacritics normalisation, stemming.

Query

At query level we have to declare the query shape which will get executed on the target search platform.

Version

A query will be executed n times, where n is the number of versions of our system. For example, if we have three version (v1.1, v1.2 and v1.3), the same query will be executed three times, once for each configuration version. Metrics are primarily computed at this level and as consequence of that, each version will have one or more metrics associated.

Metric

The leaf entity is a metric, computed after executing a given query against a given system version.

Metrics (see the vertical dashed lines) are primarily bound at query/version level but RRE aggregates their values also at upper levels (at query group, at topic and at corpus level), using an aggregation function (at the moment the arithmetic mean). So at the end, each entity will have, a multivalued metric (one for each version). The benefit of having a composite structure is clear: we can see a metric value at different levels (e.g. a query, all queries belonging to a query group, all queries belonging to a topic or at corpus level)

domain_model_with_metrics