-
Notifications
You must be signed in to change notification settings - Fork 252
Main Concepts
This document presents the main concepts used throughout Audiveris application.
It is a work in progress which can still be modified or extended as new concrete needs appear.
The same data model is used both in memory and on disk in an Audiveris project file:
- On disk, by definition, are only the so-called "persistent" entities (and flagged as such in code),
- In memory, on top of the loaded persistent entities, are additional "transient" entities such as cached duplications or other informations that are not worth to persist beyond current processing step.
As much as possible the topics of this documentation are presented in a progressive and rather logical way, and linked with the most relevant Java classes in Audiveris code.
To jump directly to any specific topic, the reader can use the following index, sorted in alphabetical order:
An image file fed into OMR software contains one or several images. Typically PDF and TIFF formats support the notion of multi-image files while, for example, JPEG or PNG formats can deal only with single-image files.
For Audiveris, this physical containment is modeled as one Book class instance (corresponding to the input file) which contains a sequence of one or several Sheet class instances (one sheet corresponding to one image).
A (super-) Book
instance could recursively contain (sub-) Book
instances.
At the sheet level, staves are gathered into systems, and a given sheet generally contains several systems.
A system may be left-indented with respect to the other systems in the sheet, to indicate the beginning of a movement. A non-indented system is assumed to belong to the same movement as the previous system (located just above in current sheet or at the end of the previous sheet).
In Audiveris, this logical containment is modeled as one instance of Score class per movement (since "Score" is the word used by MusicXML), the score containing a sequence of one or several Page class instances. Generally, there is exactly one page per sheet, except when an indented system appears in the middle of the sheet, thus beginning a new page within the same sheet.
Several Score
instances could be gathered into one instance of Opus.
A sheet image may contain no music, this happens for example for a title or illustration or simply a blank sheet. In that case, the sheet is marked as "invalid" (from the OMR point of view) and is considered as a score break: It ends the current score and the next "valid" sheet (containing music) encountered if any will begin another score.
Audiveris OMR processes each sheet in a pipeline manner. As of this writing, the pipeline is made of 20 steps as follows:
LOAD : Load the sheet (gray) picture
BINARY : Binarize the sheet picture
SCALE : Compute sheet line thickness, interline, beam thickness
GRID : Retrieve staff lines, barlines, systems & parts
HEADERS : Retrieve Clef-Key-Time systems headers
STEM_SEEDS : Retrieve stem thickness & seeds for stems
BEAMS : Retrieve beams
LEDGERS : Retrieve ledgers
HEADS : Retrieve note heads & whole notes
STEMS : Build stems connected to heads & beams
REDUCTION : Reduce structures of heads, stems & beams
CUE_BEAMS : Retrieve cue beams
TEXTS : Call OCR on textual items
MEASURES : Retrieve raw measures from groups of bar lines
CHORDS : Gather notes heads into chords
CURVES : Retrieve slurs, wedges & endings
SYMBOLS : Retrieve fixed-shape symbols
LINKS : Link and reduce symbols
RHYTHMS : Handle rhythms within measures
PAGE : Connect systems within page
Pages connection within score takes place at Book/Score level when the relevant valid sheets have reached their PAGE step.
The diagram above depicts the typical life cycle of an Audiveris project:
- Project is created from an input images file, with as many stubs as there are images in input file.
- LOAD step attempts to load image #N from the input file
- BINARY step binarizes the image to black & white and can save the result into BINARY.xml. From that point on, the binary table is used as the original reference.
- Any other sheet processing step of the pipeline builds upon the results on the previous step. If so desired, current sheet data can be saved on disk at successful end of step, for later reload.
Remarks:
-
As of this writing, there is no direct way to move OMR one step backward, i.e. stepping back from source step S to S-1 but we can always move to any target step T.
- If the source (current) step S is lower than T, the processing will run on the missing steps from S (excluded) to T (included).
- If the source step S is already at T or a later step, the processing will restart from the binary table (read from project file) and run until step T.
-
We can always save data at end of any step, and reload later from the saved project file.
-
From OMR engine point of view, the smallest processing increment is to run a given step on a given sheet. But there are also convenient ways to process, for example:
- all steps for a given sheet,
- a given step for all book sheets (in batch mode only),
- all steps for all book sheets.
-
Depending on which resources are most critical between elapsed time, CPU and memory, several options are available for engine internal scheduling:
- Sheets can be processed sequentially or in parallel,
- Within certain steps, systems of a sheet can be processed in parallel if so desired.
-
Any external program can read the project data it needs, provided that OMR has reached a stable status (read: no step is in progress).
- An example is to make GRID data available (staves, barlines, parts, systems) with precise location in original image for easy online tracking.
- Another example is to export a given movement (score) of a book into a MusicXML file or on-the-fly as a data stream to a sequencer. Passing the MusicXML file to various external programs is the purpose of plugins.
A glyph (Glyph class) is nothing more than an un-mutable set of foreground (black) pixels, precisely located in a sheet binary image.
It carries no shape.
It is not related to a staff. It does not even belong to a system. The reason is there is no reliable way to assign a glyph located in the "gutter" between two systems or two staves: does it belong to the upper or the lower system/staff?
These restrictions on glyphs don't apply to glyph interpretations (see Inter).
As opposed to Glyph instances which are un-mutable and thus fixed in shape, some entities may vary as we add pixels to them.
This is typically the case of Filament class instances which are used to incrementally define some long symbols such as staff lines.
Such filaments are used only within a step, they are not persisted in a project file, although their final value may get converted to a (fixed) glyph which can then be stored.
A sample (Sample class) is a Glyph
augmented by shape
and scale informations.
It is meant only to store concrete training samples to be used later for training the glyph
classifier.
An interpretation, or inter
for short, (implemented by Inter class) is precisely meant to
formalize any reasonable interpretation of a glyph.
There may be several reasonable interpretations for a given glyph and, in many cases, OMR cannot
immediately decide on the right interpretation, if any, among these mutually exclusive
interpretations.
This decision will then be postponed until later down in the OMR process, when additional
information (such as other inter
instances located nearby) gets available and helps clarify the
configuration.
As opposed to a glyph
, an inter
belongs to a system
and is often related to a staff
.
It carries a shape
and a quality grade
in (0..1) range, which can be considered as the
probability for the interpretation to be correct.
This grade is an interpretation intrinsic grade, only based on the glyph at hand in isolation.
Later, the inter
will generally be assigned a contextual grade, based on the inter grade and the
supporting relations with other inter instances nearby.
A Relation
(Relation class) defines a relation between a source Inter class instance and
a (different) target Inter class instance.
There are 3 kinds of relation:
-
Negative relation
It tells that the two interpretations (the two
Inter
instances) cannot coexist in the final configuration. So at least one of them will disappear at the next SIG reduction.This relation is implemented by an Exclusion class instance which carries a
Cause
information:-
OVERLAP
Two interpretations that physically overlap ( and do not explicitly support each other ) are considered as mutually exclusive.
For example we may have found several possible interpretations for a clef at the beginning of a staff: theseClefInter
instances will be mutually linked by anOVERLAP
exclusion. -
INCOMPATIBLE
For some other reason than overlap, two interpretations may not be compatible.
Keeping the same example of staff header, a clef interpretation may not be compatible with a certain key-signature interpretation located right after the clef and thus linked by anINCOMPATIBLE
exclusion.
-
-
Neutral relation
The only case of such neutral relation is the
bar-group
relation that ties two close consecutive barlines, making them part of the same barline group. -
Positive relation
In a positive relation, the two linked interpretations support each other.
A typical example is a black head interpretation and a stem interpretation nearby with a suitable connection between them.
This is implemented by ahead-stem
relation (an instance of HeadStemRelation class), whosegrade
attribute formalizes the connection quality.
As a supporting relation, theHeadStemRelation
class predefines a support ratio for source (here ahead
) and a support ratio for target (here astem
). Source and target predefined ratios, combined with measured relation grade, increase the contextual grade of their respectiveInter
instance.Doing so, even rather low-quality interpretations, when well combined through supporting relations, may end up with acceptable contextual quality.
The no-exclusion
relation (NoExclusion class) is a special supporting relation with
predefined ratios set to 0.
Its purpose is simply to tell that the source and target interpretations do not exclude one
another ( although they may physically overlap ).
Mirrored heads is a typical example using such no-exclusion
:
A physical note head with one stem on bottom left corner and another stem on top right corner,
will give birth to two separate (but "mirrored") Head
inter instances, each linked to its
own Stem
.
In that case, each Stem
instance with have a head-stem
relation with its "own" Head
and
also a no-exclusion
relation with the other Head
.
[Side question: to which Head
inter does the augmentation dot relate? :-) ]
A Symbol Interpretation Graph (SIG), implemented by SIGraph class, is simply a graph with Inter class instances as vertices and Relation class instances as edges.
This sig
plays the central role in Audiveris V5.
Its main purpose is to formalize and manage the mutual exclusions and the supporting relations
within a population of candidate interpretations.
There is one sig
per system, and at some points in OMR pipeline (typically the REDUCTION
step
and the RHYTHMS
step), the sig
is reduced so that no exclusion remains in its graph.
Unless explicitly stated otherwise, an entity ID is an integer representing the rank, always starting from 1, within the containing object.
Hence in pseudo Java code, entity.id == container.indexOf(entity) + 1
Entity | Direct container | ID from | Comments |
---|---|---|---|
Sheet (SheetStub ) |
Book |
Book |
|
System |
Sheet |
Sheet |
|
Part |
System |
System |
|
Page |
Sheet |
Sheet |
|
Score |
Book |
Book |
and Opus if used |
Slot |
MeasureStack |
MeasureStack |
|
Voice |
Measure |
Measure |
Nota: may be renamed |
Glyph |
Sheet |
Sheet |
scope shared with Inter
|
Staff |
System |
Sheet |
staves are detected before systems |
MeasureStack |
System |
Page |
measures restart from 1 at Page break |
BeamGroup |
Measure |
MeasureStack |
|
Inter |
Sig |
Sheet |
scope shared with Glyph
|
Software licensed under the GNU Affero General Public License (AGPL) Version 3
© 2000-2023 Audiveris. Logo designed by Katka.