Skip to content

Commit

Permalink
Merge pull request #83 from dcower/master
Browse files Browse the repository at this point in the history
Add Spherical Video V2 RFC. Update Spatial Audio RFC.
  • Loading branch information
dcower committed May 18, 2016
2 parents 7520a00 + 5337a1c commit 550a3bc
Show file tree
Hide file tree
Showing 4 changed files with 325 additions and 8 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ A collection of specifications and tools for 360° video and spatial audio, i

- [Spatial Audio](docs/spatial-audio-rfc.md) metadata specification
- [Spherical Video](docs/spherical-video-rfc.md) metadata specification
- [Spatial Media tools](spatialmedia/) for injecting spatial media metadata in video files
- [Spherical Video V2](docs/spherical-video-v2-rfc.md) metadata specification
- [Spatial Media tools](spatialmedia/) for injecting spatial media metadata in media files

Try out [Jump Inspector](https://g.co/jump/inspector), an Android app for previewing VR videos with spatial audio.
70 changes: 66 additions & 4 deletions docs/spatial-audio-rfc.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Spatial Audio RFC (draft)
*This document describes an open metadata scheme by which MP4 multimedia containers may accommodate spatial audio. Comments are welcome by filing an issue on GitHub.*
*This document describes an open metadata scheme by which MP4 multimedia containers may accommodate spatial and non-diegetic audio. Comments are welcome on the [spatial-media-discuss](https://groups.google.com/forum/#!forum/spatial-media-discuss) mailing list or by [filing an issue](https://github.com/google/spatial-media/issues) on GitHub.*

------------------------------------------------------

## Metadata Format

### MP4
Spatial audio metadata is stored in a new box, `SA3D`, defined in this RFC, in an MP4 container. The metadata is applicable to individual tracks in the container.
Spatial audio metadata is stored in a new box, `SA3D`, defined in this RFC. Non-diegetic audio metadata is stored in a new box, `SAND`, defined in this RFC. The metadata is applicable to individual tracks in an MP4 container.

#### Spatial Audio Box (SA3D)
##### Definition
Expand Down Expand Up @@ -65,7 +65,7 @@ aligned(8) class SpatialAudioBox extends Box(‘SA3D’) {

##### Example

Here is an example MP4 box hierarchy for a file containing the SA3D box:
Here is an example MP4 box hierarchy for a file containing the `SA3D` box:

- moov
- trak
Expand All @@ -77,7 +77,7 @@ Here is an example MP4 box hierarchy for a file containing the SA3D box:
- esds
- SA3D

where the SA3D box has the following data:
where the `SA3D` box has the following data:

| Field Name | Value |
|:-----------|:-----|
Expand All @@ -94,6 +94,55 @@ where the SA3D box has the following data:

------------------------------------------------------

#### Non-Diegetic Audio Box (SAND)
##### Definition
Box Type: `SAND`
Container: Sound Sample Description box (e.g., `mp4a`, `lpcm`, `sowt`, etc.)
Mandatory: No
Quantity: Zero or one

When present, provides additional information about the non-diegetic audio content contained in this audio track. This can be used alongisde `SA3D` in a head-tracked virtual reality experience to provide audio which should remain unchanged by listener head rotation; e.g., narration or stereo music.

##### Syntax
```
aligned(8) class NonDiegeticAudioBox extends Box(‘SAND’) {
unsigned int(8) version;
}
```

##### Semantics
- `version` is an 8-bit unsigned integer that specifies the version of this box. Must be set to `0`.

##### Example

Here is an example MP4 box hierarchy for a file containing the `SA3D` and `SAND` boxes, to mix spatial audio with non-diegetic audio:

- moov
- trak
- mdia
- minf
- stbl
- stsd
- mp4a
- esds
- SA3D
- trak
- mdia
- minf
- stbl
- stsd
- mp4a
- esds
- SAND

where the `SAND` box has the following data:

| Field Name | Value |
|:-----------|:-----|
| `version` | `0` |

------------------------------------------------------

## Appendix 1 - Ambisonics
The traditional notion of ambisonics is used, where the sound field is represented by spherical harmonics coefficients using the *associated Legendre polynomials* (without *Condon-Shortley phase*) as the basis functions. Thus, the spherical harmonic of degree `l` and order `m` at elevation `E` and azimuth `A` is given by:

Expand All @@ -103,3 +152,16 @@ where:
- `N(l, m)` is the spherical harmonics normalization function used.
- `P(l, m, x)` is the (unnormalized) *associated Legendre polynomial*, without *Condon-Shortley phase*, of degree `l` and order `m` evaluated at `x`.
- `T(m, x)` is `sin(-m * x)` for `m < 0` and `cos(m * x)` otherwise.

### Conventions
#### Azimuth
- `A = 0`: The source is in front of the listener.
- `A` in `(0, pi/2)`: The source is in the forward-left quadrant.
- `A` in `(pi/2, pi)`: The source is in the back-left quadrant.
- `A` in `(-pi/2, 0)`: The source is in the forward-right quadrant.
- `A` in `(-pi, -pi/2)`: The source is in the back-right quadrant.

#### Elevation
- `E = 0`: The source is in the horizontal plane.
- `E` in `(0, pi/2]`: The source is above the listener.
- `E` in `[-pi/2, 0)`: The source is below the listener.
7 changes: 4 additions & 3 deletions docs/spherical-video-rfc.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Spherical Video RFC (draft)
*This document describes an open metadata scheme by which Matroska-like and MP4 multimedia containers may accommodate spherical video. Comments are welcome on the [webm-discuss](https://groups.google.com/a/webmproject.org/forum/#!forum/webm-discuss) mailing list or [file an issue](https://github.com/google/spatial-media/issues) on GitHub.*
# Spherical Video RFC
**Note: This metadata scheme is superseded by the [Spherical Video V2](spherical-video-v2-rfc.md) metadata specification.**

*This document describes an open metadata scheme by which Matroska-like and MP4 multimedia containers may accommodate spherical video. Comments are welcome on the [spatial-media-discuss](https://groups.google.com/forum/#!forum/spatial-media-discuss) mailing list or by [filing an issue](https://github.com/google/spatial-media/issues) on GitHub.*

*Last modified: 2015-02-06*

------------------------------------------------------

Expand Down
253 changes: 253 additions & 0 deletions docs/spherical-video-v2-rfc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# Spherical Video V2 RFC (draft)
*This document describes a revised open metadata scheme by which MP4 (ISOBMFF)
multimedia containers may accommodate spherical videos. Comments are welcome on
the [spatial-media-discuss]
(https://groups.google.com/forum/#!forum/spatial-media-discuss)
mailing list or by [filing an issue]
(https://github.com/google/spatial-media/issues) on GitHub.*

------------------------------------------------------

## Metadata Format

### MP4 (ISOBMFF)
Spherical video metadata is stored in a new box, `SV3D`, defined in this RFC, in
an MP4 (ISOBMFF) container. The metadata is applicable to individual video
tracks in the container.

As the V2 specification stores its metadata in a different location, it is
possible for a file to contain both the V1 and V2 metadata. If both V1 and V2
metadata are contained they should contain semantically equivalent information,
with V2 taking priority when they differ.

#### Spherical Video Box (SV3D)
##### Definition
Box Type: `SV3D`
Container: Video Sample Description box (e.g. `avc1`, `mp4v`, `apcn`)
Mandatory: No
Quantity: Zero or one

Stores additional information about spherical video content contained in this
video track.

##### Syntax
```
aligned(8) class SphericalVideoBox extends Box(‘SV3D’) {
}
```

#### Spherical Video Header (SVHD)
##### Definition
Box Type: `SVHD`
Container: `SV3D`
Mandatory: Yes
Quantity: Exactly one

Contains spherical video information unrelated to the projection format.

##### Syntax
```
aligned(8) class SphericalVideoHeader extends FullBox(‘SVHD’, 0, 0) {
string metadata_source;
}
```

##### Semantics

- `metadata_source` is a string identifier for the source tool of the SV3D
metadata.

#### Projection Box (PROJ)
##### Definition
Box Type: `PROJ`
Container: `SV3D`
Mandatory: Yes
Quantity: Exactly one

Container for projection information about the spherical video content.
This container must contain exactly one projection (e.g. an `EQUI` box) which
defines the spherical video's projection.

##### Syntax
```
aligned(8) class Projection extends Box(‘PROJ’) {
}
```

#### Projection Header Box (PRHD)
##### Definition
Box Type: `PRHD`
Container: `PROJ`
Mandatory: Yes
Quantity: Exactly one

Contains projection information about the spherical video content that is
independent of the video projection.

##### Syntax
```
aligned(8) class ProjectionHeader extends FullBox(‘PROJ’, 0, 0) {
unsigned int(8) stereo_mode;
int(32) pose_yaw_degrees;
int(32) pose_pitch_degrees;
int(32) pose_roll_degrees;
}
```

##### Semantics

- `stereo_mode` is an 8-bit unsigned integer that specifies the stereo frame
layout. The values 0 to 255 are reserved for current and future layouts. The
following values are defined:

| `stereo_mode` | Stereo Mode Description |
|:-----------------|:---------------------------|
| `0` | **Monoscopic**: Indicates the video frame contains a single monoscopic view. |
| `1` | **Stereoscopic Top-Bottom**: Indicates the video frame contains a stereoscopic view storing the left eye on top half of the frame and right eye at the bottom half of the frame.|
| `2` | **Stereoscopic Left-Right**: Indicates the video frame contains a stereoscopic view storing the left eye on left half of the frame and right eye on the right half of the frame.|

- Pose values are 16.16 fixed point values measuring rotation in degrees. These
rotations transform the the projection as follows:
- `pose_yaw_degrees` clockwise rotation by the up vector
- `pose_pitch_degrees` counter-clockwise rotation over the right vector post
yaw transform
- `pose_roll_degrees` counter clockwise-rotation over the forward vector post
yaw and pitch transform

#### Projection Data Box
##### Definition
Box Type: Projection Dependent Identifier
Container: `PROJ`
Mandatory: Yes
Quantity: Exactly one

Base class for all projection data boxes. Any new projection must subclass this
type with a unique proj_type.

##### Syntax
```
aligned(8) class ProjectionDataBox(unsigned int(32) proj_type, unsigned int(32) version, unsigned int(32) flags)
extends FullBox(proj_type, version, flags) {
}
```

#### Cubemap Projection Box (CBMP)
##### Definition
Box Type: `CBMP`
Container: `PROJ`

Contains the projection dependent information for a cubemap video. The
[cubemap's](https://en.wikipedia.org/wiki/Cube_mapping) face layout is defined
by a unique `layout` value.

##### Syntax
```
aligned(8) class CubemapProjection ProjectionDataBox(‘CBMP’, 0, 0) {
unsigned int(32) layout;
unsigned int(32) padding;
}
```

##### Semantics
- `layout` is a 32-bit unsigned integer describing the layout of cube faces. The
values 0 to 255 are reserved for current and future layouts.
- a value of `0` corresponds to a grid with 3 columns and 2 rows. Faces are
oriented upwards for the front, left, right, and back faces. The up face is
oriented so the top of the face is forwards and the down face is oriented
so the top of the face is to the back.
<center>
<table>
<tr>
<td>right face</td>
<td>left face</td>
<td>up face</td>
</tr>
<tr>
<td>down face</td>
<td>front face</td>
<td>back face</td>
</tr>
</table>
</center>

- `padding` is a 32-bit unsigned integer measuring the number of pixels to pad
from the edge of each cube face.

#### Equirectangular Projection Box (EQUI)
##### Definition
Box Type: `EQUI`
Container: `PROJ`

Contains the projection dependent information for a equirectangular video. The
[equirectangular projection](
https://en.wikipedia.org/wiki/Equirectangular_projection) should be arranged
such that the default pose has the forward vector in the center of the frame,
the up vector at top of the frame, and the right vector towards the right of the
frame.

##### Syntax
```
aligned(8) class EquirectangularProjection ProjectionDataBox(‘EQUI’, 0, 0) {
unsigned int(32) crop_top;
unsigned int(32) crop_bottom;
unsigned int(32) crop_left;
unsigned int(32) crop_right;
}
```

##### Semantics

- The crop values use a 0.32 fixed point float. These values repesent the
proportion of projection cropped from each edge not covered by the video
frame. For an uncropped frame all values are 0.
- `crop_top` is the amount from the top of the frame to crop
- `crop_bottom` is the amount from the bottom of the frame to crop; must be
less than 0xFFFFFFFF - crop_top
- `crop_left` is the amount from the left of the frame to crop
- `crop_right` is the amount from the right of the frame to crop; must be
less than 0xFFFFFFFF - crop_left

### Example

Here is an example box hierarchy for a file containing the SV3D metadata for a
monoscopic equirectangular video:

- moov
- trak
- mdia
- minf
- stbl
- stsd
- avc1
- pasp
- SV3D
- SVHD
- PROJ
- PRHD
- EQUI

where the SVHD box contains:

| Field Name | Value |
|:-----------|:------|
| `metadata_source`| `Spherical Metadata Tooling` |

the PRHD box contains:

| Field Name | Value |
|:-----------|:-----|
| `stereo_mode` | `0` |
| `pose_yaw_degrees` | `0` |
| `pose_pitch_degrees` | `0` |
| `pose_roll_degrees` | `0` |

and the EQUI box contains:

| Field Name | Value |
|:-----------|:-----|
| `crop_top` | `0` |
| `crop_bottom` | `0` |
| `crop_left` | `0` |
| `crop_right` | `0` |

0 comments on commit 550a3bc

Please sign in to comment.