diff --git a/README.md b/README.md index 962aa4d..7daf578 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,7 @@ A collection of specifications and tools for 360° video and spatial audio, i - [Spatial Audio](docs/spatial-audio-rfc.md) metadata specification - [Spherical Video](docs/spherical-video-rfc.md) metadata specification -- [Spatial Media tools](spatialmedia/) for injecting spatial media metadata in video files +- [Spherical Video V2](docs/spherical-video-v2-rfc.md) metadata specification +- [Spatial Media tools](spatialmedia/) for injecting spatial media metadata in media files Try out [Jump Inspector](https://g.co/jump/inspector), an Android app for previewing VR videos with spatial audio. diff --git a/docs/spatial-audio-rfc.md b/docs/spatial-audio-rfc.md index 75c077e..c3319f5 100644 --- a/docs/spatial-audio-rfc.md +++ b/docs/spatial-audio-rfc.md @@ -1,12 +1,12 @@ # Spatial Audio RFC (draft) -*This document describes an open metadata scheme by which MP4 multimedia containers may accommodate spatial audio. Comments are welcome by filing an issue on GitHub.* +*This document describes an open metadata scheme by which MP4 multimedia containers may accommodate spatial and non-diegetic audio. Comments are welcome on the [spatial-media-discuss](https://groups.google.com/forum/#!forum/spatial-media-discuss) mailing list or by [filing an issue](https://github.com/google/spatial-media/issues) on GitHub.* ------------------------------------------------------ ## Metadata Format ### MP4 -Spatial audio metadata is stored in a new box, `SA3D`, defined in this RFC, in an MP4 container. The metadata is applicable to individual tracks in the container. +Spatial audio metadata is stored in a new box, `SA3D`, defined in this RFC. Non-diegetic audio metadata is stored in a new box, `SAND`, defined in this RFC. The metadata is applicable to individual tracks in an MP4 container. #### Spatial Audio Box (SA3D) ##### Definition @@ -65,7 +65,7 @@ aligned(8) class SpatialAudioBox extends Box(‘SA3D’) { ##### Example -Here is an example MP4 box hierarchy for a file containing the SA3D box: +Here is an example MP4 box hierarchy for a file containing the `SA3D` box: - moov - trak @@ -77,7 +77,7 @@ Here is an example MP4 box hierarchy for a file containing the SA3D box: - esds - SA3D -where the SA3D box has the following data: +where the `SA3D` box has the following data: | Field Name | Value | |:-----------|:-----| @@ -94,6 +94,55 @@ where the SA3D box has the following data: ------------------------------------------------------ +#### Non-Diegetic Audio Box (SAND) +##### Definition +Box Type: `SAND` +Container: Sound Sample Description box (e.g., `mp4a`, `lpcm`, `sowt`, etc.) +Mandatory: No +Quantity: Zero or one + +When present, provides additional information about the non-diegetic audio content contained in this audio track. This can be used alongisde `SA3D` in a head-tracked virtual reality experience to provide audio which should remain unchanged by listener head rotation; e.g., narration or stereo music. + +##### Syntax +``` +aligned(8) class NonDiegeticAudioBox extends Box(‘SAND’) { + unsigned int(8) version; +} +``` + +##### Semantics +- `version` is an 8-bit unsigned integer that specifies the version of this box. Must be set to `0`. + +##### Example + +Here is an example MP4 box hierarchy for a file containing the `SA3D` and `SAND` boxes, to mix spatial audio with non-diegetic audio: + +- moov + - trak + - mdia + - minf + - stbl + - stsd + - mp4a + - esds + - SA3D + - trak + - mdia + - minf + - stbl + - stsd + - mp4a + - esds + - SAND + +where the `SAND` box has the following data: + +| Field Name | Value | +|:-----------|:-----| +| `version` | `0` | + +------------------------------------------------------ + ## Appendix 1 - Ambisonics The traditional notion of ambisonics is used, where the sound field is represented by spherical harmonics coefficients using the *associated Legendre polynomials* (without *Condon-Shortley phase*) as the basis functions. Thus, the spherical harmonic of degree `l` and order `m` at elevation `E` and azimuth `A` is given by: @@ -103,3 +152,16 @@ where: - `N(l, m)` is the spherical harmonics normalization function used. - `P(l, m, x)` is the (unnormalized) *associated Legendre polynomial*, without *Condon-Shortley phase*, of degree `l` and order `m` evaluated at `x`. - `T(m, x)` is `sin(-m * x)` for `m < 0` and `cos(m * x)` otherwise. + +### Conventions +#### Azimuth +- `A = 0`: The source is in front of the listener. +- `A` in `(0, pi/2)`: The source is in the forward-left quadrant. +- `A` in `(pi/2, pi)`: The source is in the back-left quadrant. +- `A` in `(-pi/2, 0)`: The source is in the forward-right quadrant. +- `A` in `(-pi, -pi/2)`: The source is in the back-right quadrant. + +#### Elevation +- `E = 0`: The source is in the horizontal plane. +- `E` in `(0, pi/2]`: The source is above the listener. +- `E` in `[-pi/2, 0)`: The source is below the listener. diff --git a/docs/spherical-video-rfc.md b/docs/spherical-video-rfc.md index 1d26fd0..ddc3009 100644 --- a/docs/spherical-video-rfc.md +++ b/docs/spherical-video-rfc.md @@ -1,7 +1,8 @@ -# Spherical Video RFC (draft) -*This document describes an open metadata scheme by which Matroska-like and MP4 multimedia containers may accommodate spherical video. Comments are welcome on the [webm-discuss](https://groups.google.com/a/webmproject.org/forum/#!forum/webm-discuss) mailing list or [file an issue](https://github.com/google/spatial-media/issues) on GitHub.* +# Spherical Video RFC +**Note: This metadata scheme is superseded by the [Spherical Video V2](spherical-video-v2-rfc.md) metadata specification.** + +*This document describes an open metadata scheme by which Matroska-like and MP4 multimedia containers may accommodate spherical video. Comments are welcome on the [spatial-media-discuss](https://groups.google.com/forum/#!forum/spatial-media-discuss) mailing list or by [filing an issue](https://github.com/google/spatial-media/issues) on GitHub.* -*Last modified: 2015-02-06* ------------------------------------------------------ diff --git a/docs/spherical-video-v2-rfc.md b/docs/spherical-video-v2-rfc.md new file mode 100644 index 0000000..32a2d26 --- /dev/null +++ b/docs/spherical-video-v2-rfc.md @@ -0,0 +1,253 @@ +# Spherical Video V2 RFC (draft) +*This document describes a revised open metadata scheme by which MP4 (ISOBMFF) +multimedia containers may accommodate spherical videos. Comments are welcome on +the [spatial-media-discuss] +(https://groups.google.com/forum/#!forum/spatial-media-discuss) +mailing list or by [filing an issue] +(https://github.com/google/spatial-media/issues) on GitHub.* + +------------------------------------------------------ + +## Metadata Format + +### MP4 (ISOBMFF) +Spherical video metadata is stored in a new box, `SV3D`, defined in this RFC, in +an MP4 (ISOBMFF) container. The metadata is applicable to individual video +tracks in the container. + +As the V2 specification stores its metadata in a different location, it is +possible for a file to contain both the V1 and V2 metadata. If both V1 and V2 +metadata are contained they should contain semantically equivalent information, +with V2 taking priority when they differ. + +#### Spherical Video Box (SV3D) +##### Definition +Box Type: `SV3D` +Container: Video Sample Description box (e.g. `avc1`, `mp4v`, `apcn`) +Mandatory: No +Quantity: Zero or one + +Stores additional information about spherical video content contained in this +video track. + +##### Syntax +``` +aligned(8) class SphericalVideoBox extends Box(‘SV3D’) { +} +``` + +#### Spherical Video Header (SVHD) +##### Definition +Box Type: `SVHD` +Container: `SV3D` +Mandatory: Yes +Quantity: Exactly one + +Contains spherical video information unrelated to the projection format. + +##### Syntax +``` +aligned(8) class SphericalVideoHeader extends FullBox(‘SVHD’, 0, 0) { + string metadata_source; +} +``` + +##### Semantics + +- `metadata_source` is a string identifier for the source tool of the SV3D +metadata. + +#### Projection Box (PROJ) +##### Definition +Box Type: `PROJ` +Container: `SV3D` +Mandatory: Yes +Quantity: Exactly one + +Container for projection information about the spherical video content. +This container must contain exactly one projection (e.g. an `EQUI` box) which +defines the spherical video's projection. + +##### Syntax +``` +aligned(8) class Projection extends Box(‘PROJ’) { +} +``` + +#### Projection Header Box (PRHD) +##### Definition +Box Type: `PRHD` +Container: `PROJ` +Mandatory: Yes +Quantity: Exactly one + +Contains projection information about the spherical video content that is +independent of the video projection. + +##### Syntax +``` +aligned(8) class ProjectionHeader extends FullBox(‘PROJ’, 0, 0) { + unsigned int(8) stereo_mode; + + int(32) pose_yaw_degrees; + int(32) pose_pitch_degrees; + int(32) pose_roll_degrees; +} +``` + +##### Semantics + +- `stereo_mode` is an 8-bit unsigned integer that specifies the stereo frame + layout. The values 0 to 255 are reserved for current and future layouts. The + following values are defined: + +| `stereo_mode` | Stereo Mode Description | +|:-----------------|:---------------------------| +| `0` | **Monoscopic**: Indicates the video frame contains a single monoscopic view. | +| `1` | **Stereoscopic Top-Bottom**: Indicates the video frame contains a stereoscopic view storing the left eye on top half of the frame and right eye at the bottom half of the frame.| +| `2` | **Stereoscopic Left-Right**: Indicates the video frame contains a stereoscopic view storing the left eye on left half of the frame and right eye on the right half of the frame.| + +- Pose values are 16.16 fixed point values measuring rotation in degrees. These + rotations transform the the projection as follows: + - `pose_yaw_degrees` clockwise rotation by the up vector + - `pose_pitch_degrees` counter-clockwise rotation over the right vector post + yaw transform + - `pose_roll_degrees` counter clockwise-rotation over the forward vector post + yaw and pitch transform + +#### Projection Data Box +##### Definition +Box Type: Projection Dependent Identifier +Container: `PROJ` +Mandatory: Yes +Quantity: Exactly one + +Base class for all projection data boxes. Any new projection must subclass this +type with a unique proj_type. + +##### Syntax +``` +aligned(8) class ProjectionDataBox(unsigned int(32) proj_type, unsigned int(32) version, unsigned int(32) flags) + extends FullBox(proj_type, version, flags) { +} +``` + +#### Cubemap Projection Box (CBMP) +##### Definition +Box Type: `CBMP` +Container: `PROJ` + +Contains the projection dependent information for a cubemap video. The +[cubemap's](https://en.wikipedia.org/wiki/Cube_mapping) face layout is defined +by a unique `layout` value. + +##### Syntax +``` +aligned(8) class CubemapProjection ProjectionDataBox(‘CBMP’, 0, 0) { + unsigned int(32) layout; + unsigned int(32) padding; +} +``` + +##### Semantics +- `layout` is a 32-bit unsigned integer describing the layout of cube faces. The + values 0 to 255 are reserved for current and future layouts. + - a value of `0` corresponds to a grid with 3 columns and 2 rows. Faces are + oriented upwards for the front, left, right, and back faces. The up face is + oriented so the top of the face is forwards and the down face is oriented + so the top of the face is to the back. +
+ + + + + + + + + + + +
right faceleft faceup face
down facefront faceback face
+
+ +- `padding` is a 32-bit unsigned integer measuring the number of pixels to pad + from the edge of each cube face. + +#### Equirectangular Projection Box (EQUI) +##### Definition +Box Type: `EQUI` +Container: `PROJ` + +Contains the projection dependent information for a equirectangular video. The +[equirectangular projection]( +https://en.wikipedia.org/wiki/Equirectangular_projection) should be arranged +such that the default pose has the forward vector in the center of the frame, +the up vector at top of the frame, and the right vector towards the right of the +frame. + +##### Syntax +``` +aligned(8) class EquirectangularProjection ProjectionDataBox(‘EQUI’, 0, 0) { + unsigned int(32) crop_top; + unsigned int(32) crop_bottom; + unsigned int(32) crop_left; + unsigned int(32) crop_right; +} +``` + +##### Semantics + +- The crop values use a 0.32 fixed point float. These values repesent the + proportion of projection cropped from each edge not covered by the video + frame. For an uncropped frame all values are 0. + - `crop_top` is the amount from the top of the frame to crop + - `crop_bottom` is the amount from the bottom of the frame to crop; must be + less than 0xFFFFFFFF - crop_top + - `crop_left` is the amount from the left of the frame to crop + - `crop_right` is the amount from the right of the frame to crop; must be + less than 0xFFFFFFFF - crop_left + +### Example + +Here is an example box hierarchy for a file containing the SV3D metadata for a +monoscopic equirectangular video: + +- moov + - trak + - mdia + - minf + - stbl + - stsd + - avc1 + - pasp + - SV3D + - SVHD + - PROJ + - PRHD + - EQUI + +where the SVHD box contains: + +| Field Name | Value | +|:-----------|:------| +| `metadata_source`| `Spherical Metadata Tooling` | + +the PRHD box contains: + +| Field Name | Value | +|:-----------|:-----| +| `stereo_mode` | `0` | +| `pose_yaw_degrees` | `0` | +| `pose_pitch_degrees` | `0` | +| `pose_roll_degrees` | `0` | + +and the EQUI box contains: + +| Field Name | Value | +|:-----------|:-----| +| `crop_top` | `0` | +| `crop_bottom` | `0` | +| `crop_left` | `0` | +| `crop_right` | `0` | +