Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope expansion beyond whole slide imaging #25

Closed
chris-allan opened this issue Apr 17, 2020 · 3 comments
Closed

Scope expansion beyond whole slide imaging #25

chris-allan opened this issue Apr 17, 2020 · 3 comments

Comments

@chris-allan
Copy link
Member

chris-allan commented Apr 17, 2020

The initial intent of this repository and the complementary raw2ometiff tool was primarily the conversion of proprietary whole slide imaging file formats, such as MRXS, into OME-TIFF. Over the past few months there have been various external (#13, #19) contributions and internal discussions surrounding expanding this scope. Currently the data layout after conversion consists of the following files:

  • METADATA.ome.xml -- A dump of the OME-XML generated by Bio-Formats produced when initializing the Bio-Formats reader on the source file
  • pyramid.n5 -- A 3 dimension (X, Y, C) N5 pyramid with levels described by ascending stringified N5 datasets as loosely described on Extension proposal: multiscale arrays v0.1 zarr-developers/zarr-specs#50
  • LABELIMAGE.jpg -- A JPEG compressed version of the source file Bio-Formats series which corresponds to the "Label image" (optional, only Bio-Formats series 1)
  • <series_no>.jpg -- A JPEG compressed version of additional Bio-Formats series beyond 1

If coming from isyntax2raw the conversion may also consist of the following additional files:

  • METADATA.json -- A JSON encoded dump of the source file metadata
  • MACROIMAGE.jpg -- A JPEG compressed version of the source file Bio-Formats series which corresponds to the "Macro image"

This conversion data layout is obviously quite singular in its intent. Firstly, we would like to expand to cover:

  1. Multi-series input beyond whole slide imaging in a generic fashion
  2. Dimensions (Z, T) beyond the three currently being considered

This would mean:

  1. Using a generic data.n5 or similar name for the N5 data
  2. Dispensing with the secondary JPEG compressed LABELIMAGE.jpg, <series_no>.jpg, etc. files and encoding all series from the source file in ascending stringified order
  3. Expanding the number of dimensions to 5 (X, Y, C, Z, T) following the Bio-Formats declared, and OME-XML recorded, dimension order setting their size to 1 if missing entirely
  4. Adding version metadata in a defined location to aid downstream consumers
  5. Adding an option to optionally force the dimension order

Consequently, the layout should reflect the following:

data/
├── 0    # Series 0
|   ├── 0    # Full-sized array
|   ├── 1    # Scaled down 0, e.g. 0.5; for images, in the X&Y dimensions
|   ├── 2    # Scaled down 1, ...
|   ├── 3    # Scaled down 2, ...
|   └── 4    # Etc.
├── 1    # Series 1
|   └── 0    # Etc.
└── 2    # Etc.

It was also the request of @joshmoore that we provide a mechanism to map series numbers to a corresponding unique identifier and if possible also allow / separated components of that identifier to be reflected in N5 group style.

Edit (2020-04-17): Added item for forcing the dimension order

@sbesson
Copy link
Member

sbesson commented Apr 21, 2020

The proposal here makes full sense to me as the internal image/resolutions hierarchy can be exercised against the various imaging modalities OME is working with.

Another point of discussion was the expectation/conventions in terms of the dimension order and how to communicate this to downstream consumers. The common ground was to use XY as the fastest varying dimensions.

Also 👍 about storing a representation of the binary layout. Would encoding a binary layout version in the custom attributes be a good starting point or is there another native alternative?

@chris-allan
Copy link
Member Author

The current expectation, at least as far as this repository is concerned, is that the dimension order is reflected in the METADATA.ome.xml file unless forced. If forced, one would then either need to record that somewhere else or always assume the same order.

As far as storing the layout version, in #28 we are now doing that in the bioformats2raw.layout attribute as the root.

@chris-allan
Copy link
Member Author

Closing with #28 and #30, #31 and #32 having gone in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants