Skip to content

Main Concepts

Hervé Bitteur edited this page Jan 24, 2018 · 19 revisions

This document presents the main concepts used throughout Audiveris application.

It is a work in progress which can still be modified or extended as new concrete needs appear.

The same data model is used both in memory and on disk in an Audiveris project file:

  • On disk, by definition, are only the so-called "persistent" entities (and flagged as such in code),
  • In memory, on top of the loaded persistent entities, are additional "transient" entities such as cached duplications or other informations that are not worth to persist beyond current processing step.

As much as possible the topics of this documentation are presented in a progressive and rather logical way, and linked with the most relevant Java classes in Audiveris code.

To jump directly to any specific topic, the reader can use the following index, sorted in alphabetical order:

Containers

Physical containment: Book and Sheet

An image file fed into OMR software contains one or several images. Typically PDF and TIFF formats support the notion of multi-image files while, for example, JPEG or PNG formats can deal only with single-image files.

For Audiveris, this physical containment is modeled as one Book class instance (corresponding to the input file) which contains a sequence of one or several Sheet class instances (one sheet corresponding to one image).

A (super-) Book instance could recursively contain (sub-) Book instances.

Logical containment: Score and Page

At the sheet level, staves are gathered into systems, and a given sheet generally contains several systems.

A system may be left-indented with respect to the other systems in the sheet, to indicate the beginning of a movement. A non-indented system is assumed to belong to the same movement as the previous system (located just above in current sheet or at the end of the previous sheet).

In Audiveris, this logical containment is modeled as one instance of Score class per movement (since "Score" is the word used by MusicXML), the score containing a sequence of one or several Page class instances. Generally, there is exactly one page per sheet, except when an indented system appears in the middle of the sheet, thus beginning a new page within the same sheet.

Several Score instances could be gathered into one instance of Opus.

A sheet image may contain no music, this happens for example for a title or illustration or simply a blank sheet. In that case, the sheet is marked as "invalid" (from the OMR point of view) and is considered as a score break: It ends the current score and the next "valid" sheet (containing music) encountered if any will begin another score.

Sheet processing steps

Audiveris OMR processes each sheet in a pipeline manner. As of this writing, the pipeline is made of 20 steps as follows:

LOAD       : Load the sheet (gray) picture
BINARY     : Binarize the sheet picture
SCALE      : Compute sheet line thickness, interline, beam thickness
GRID       : Retrieve staff lines, barlines, systems & parts
HEADERS    : Retrieve Clef-Key-Time systems headers
STEM_SEEDS : Retrieve stem thickness & seeds for stems
BEAMS      : Retrieve beams
LEDGERS    : Retrieve ledgers
HEADS      : Retrieve note heads & whole notes
STEMS      : Build stems connected to heads & beams
REDUCTION  : Reduce structures of heads, stems & beams
CUE_BEAMS  : Retrieve cue beams
TEXTS      : Call OCR on textual items
MEASURES   : Retrieve raw measures from groups of bar lines
CHORDS     : Gather notes heads into chords
CURVES     : Retrieve slurs, wedges & endings
SYMBOLS    : Retrieve fixed-shape symbols
LINKS      : Link and reduce symbols
RHYTHMS    : Handle rhythms within measures
PAGE       : Connect systems within page

Pages connection within score takes place at Book/Score level when the relevant valid sheets have reached their PAGE step.

The diagram above depicts the typical life cycle of an Audiveris project:

  1. Project is created from an input images file, with as many stubs as there are images in input file.
  2. LOAD step attempts to load image #N from the input file
  3. BINARY step binarizes the image to black & white and can save the result into BINARY.xml. From that point on, the binary table is used as the original reference.
  4. Any other sheet processing step of the pipeline builds upon the results on the previous step. If so desired, current sheet data can be saved on disk at successful end of step, for later reload.

Remarks:

  • As of this writing, there is no direct way to move OMR one step backward, i.e. stepping back from source step S to S-1 but we can always move to any target step T.

    • If the source (current) step S is lower than T, the processing will run on the missing steps from S (excluded) to T (included).
    • If the source step S is already at T or a later step, the processing will restart from the binary table (read from project file) and run until step T.
  • We can always save data at end of any step, and reload later from the saved project file.

  • From OMR engine point of view, the smallest processing increment is to run a given step on a given sheet. But there are also convenient ways to process, for example:

    • all steps for a given sheet,
    • a given step for all book sheets (in batch mode only),
    • all steps for all book sheets.
  • Depending on which resources are most critical between elapsed time, CPU and memory, several options are available for engine internal scheduling:

    • Sheets can be processed sequentially or in parallel,
    • Within certain steps, systems of a sheet can be processed in parallel if so desired.
  • Any external program can read the project data it needs, provided that OMR has reached a stable status (read: no step is in progress).

    • An example is to make GRID data available (staves, barlines, parts, systems) with precise location in original image for easy online tracking.
    • Another example is to export a given movement (score) of a book into a MusicXML file or on-the-fly as a data stream to a sequencer. Passing the MusicXML file to various external programs is the purpose of plugins.

Glyph

A glyph (Glyph class) is nothing more than an un-mutable set of foreground (black) pixels, precisely located in a sheet binary image.

It carries no shape.

It is not related to a staff. It does not even belong to a system. The reason is there is no reliable way to assign a glyph located in the "gutter" between two systems or two staves: does it belong to the upper or the lower system/staff?

These restrictions on glyphs don't apply to glyph interpretations (see Inter).

Filament

As opposed to Glyph instances which are un-mutable and thus fixed in shape, some entities may vary as we add pixels to them.

This is typically the case of Filament class instances which are used to incrementally define some long symbols such as staff lines.

Such filaments are used only within a step, they are not persisted in a project file, although their final value may get converted to a (fixed) glyph which can then be stored.

Sample

A sample (Sample class) is a Glyph augmented by shape and scale informations. It is meant only to store concrete training samples to be used later for training the glyph classifier.

Inter

An interpretation, or inter for short, (implemented by Inter class) is precisely meant to formalize any reasonable interpretation of a glyph.

There may be several reasonable interpretations for a given glyph and, in many cases, OMR cannot immediately decide on the right interpretation, if any, among these mutually exclusive interpretations. This decision will then be postponed until later down in the OMR process, when additional information (such as other inter instances located nearby) gets available and helps clarify the configuration.

As opposed to a glyph, an inter belongs to a system and is often related to a staff.

It carries a shape and a quality grade in (0..1) range, which can be considered as the probability for the interpretation to be correct. This grade is an interpretation intrinsic grade, only based on the glyph at hand in isolation.

Later, the inter will generally be assigned a contextual grade, based on the inter grade and the supporting relations with other inter instances nearby.

Relation

A Relation (Relation class) defines a relation between a source Inter class instance and a (different) target Inter class instance.

There are 3 kinds of relation:

  • Negative relation

    It tells that the two interpretations (the two Inter instances) cannot coexist in the final configuration. So at least one of them will disappear at the next SIG reduction.

    This relation is implemented by an Exclusion class instance which carries a Cause information:

    • OVERLAP
      Two interpretations that physically overlap ( and do not explicitly support each other ) are considered as mutually exclusive.
      For example we may have found several possible interpretations for a clef at the beginning of a staff: these ClefInter instances will be mutually linked by an OVERLAP exclusion.

    • INCOMPATIBLE
      For some other reason than overlap, two interpretations may not be compatible.
      Keeping the same example of staff header, a clef interpretation may not be compatible with a certain key-signature interpretation located right after the clef and thus linked by an INCOMPATIBLE exclusion.

  • Neutral relation

    The only case of such neutral relation is the bar-group relation that ties two close consecutive barlines, making them part of the same barline group.

  • Positive relation

    In a positive relation, the two linked interpretations support each other.

    A typical example is a black head interpretation and a stem interpretation nearby with a suitable connection between them.

    This is implemented by a head-stem relation (an instance of HeadStemRelation class), whose grade attribute formalizes the connection quality.
    As a supporting relation, the HeadStemRelation class predefines a support ratio for source (here a head) and a support ratio for target (here a stem). Source and target predefined ratios, combined with measured relation grade, increase the contextual grade of their respective Inter instance.

    Doing so, even rather low-quality interpretations, when well combined through supporting relations, may end up with acceptable contextual quality.

No-Exclusion relation

The no-exclusion relation (NoExclusion class) is a special supporting relation with predefined ratios set to 0. Its purpose is simply to tell that the source and target interpretations do not exclude one another ( although they may physically overlap ).
Mirrored heads is a typical example using such no-exclusion: A physical note head with one stem on bottom left corner and another stem on top right corner, will give birth to two separate (but "mirrored") Head inter instances, each linked to its own Stem.

In that case, each Stem instance with have a head-stem relation with its "own" Head and also a no-exclusion relation with the other Head.

[Side question: to which Head inter does the augmentation dot relate? :-) ]

SIG

A Symbol Interpretation Graph (SIG), implemented by SIGraph class, is simply a graph with Inter class instances as vertices and Relation class instances as edges.

This sig plays the central role in Audiveris V5. Its main purpose is to formalize and manage the mutual exclusions and the supporting relations within a population of candidate interpretations.

There is one sig per system, and at some points in OMR pipeline (typically the REDUCTION step and the RHYTHMS step), the sig is reduced so that no exclusion remains in its graph.

IDs

Unless explicitly stated otherwise, an entity ID is an integer representing the rank, always starting from 1, within the containing object.

Hence in pseudo Java code, entity.id == container.indexOf(entity) + 1

Entity Direct container ID from Comments
Sheet (SheetStub) Book Book
System Sheet Sheet
Part System System
Page Sheet Sheet
Score Book Book and Opus if used
Slot MeasureStack MeasureStack
Voice Measure Measure Nota: may be renamed
Glyph Sheet Sheet scope shared with Inter
Staff System Sheet staves are detected before systems
MeasureStack System Page measures restart from 1 at Page break
BeamGroup Measure MeasureStack
Inter Sig Sheet scope shared with Glyph