Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Level Manifest recovery #1452

Merged
merged 11 commits into from
Dec 4, 2017
Merged

Add Level Manifest recovery #1452

merged 11 commits into from
Dec 4, 2017

Conversation

NicolasSiver
Copy link
Member

@NicolasSiver NicolasSiver commented Nov 29, 2017

Description of the Changes

Ability to recover from the failure is an important part of good playback experience. By this time we had a segment zigzagging feature (the feature where we would hunt for a playable segment through all available levels and jump from primary to backup streams and back).

We are going introduce the same "zigzagging" logic for level manifests. Another benefit, that all these behaviors are driven by retry configuration for both: segments and levels. To complement feature, playlist loader was adjusted to not create extra retry requests, since all retry management is completely managed by Level Controller.

How does it work?

If you have a backup stream, it will look like a zigzagging hunt for the "available" segment/level manifest:

hls-js-recovery

Where: F - Bad Fragment, L - Bad Level

As you can see, if you don't have enough extra renditions, recovery logic will not be able to add extra value to your platform.

Retry Recommendations

If you have 4 renditions and a backup stream:

  • Level: don't use total retry less than 3 - 4
  • Fragment: don't use total retry less than 4 - 6
  • Implement short burst retries (i.e. small retry delay 0.5 - 4 seconds), and when library returns fatal error switch to a different CDN

Notes:

CheckLists

  • changes have been done against master branch, and PR does not conflict
  • no commits have been done in dist folder (we will take care of updating it)
  • new unit / functional tests have been added (whenever applicable)
  • Travis tests are passing (or test results are not worse than on master branch :))
  • API or design changes are documented in API.md

Copy link
Member

@mangui mangui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @NicolasSiver
it would be great to add your PR explanation (with the pic) in design.md !

if (levelError === true || fragmentError === true) {
redundantLevels = level.url.length;

if (redundantLevels > 1 && level.loadError < redundantLevels) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if my understanding is correct, in case of levelError with redundant streams available, we first schedule a setTimeout(() => this.loadLevel(), delay); then we switch to redundant a couple of lines after ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Logic for zigzagging is for both: levels and fragments (condition: levelError ===true || fragmentError === true). Where level specific logic (since we also handle Retry Management for levels in the level controller) happens a bit earlier because we can reach the limit and we have to produce fatal error.

By placing level related logic at first, we have the opportunity to bail out in a bit more DRY way.

@NicolasSiver
Copy link
Member Author

@mangui added notes to Design document.

@mangui mangui requested a review from johnBartos December 1, 2017 06:18
@mangui
Copy link
Member

mangui commented Dec 1, 2017

LGTM, a second pair of eyes might be useful.

if (!levelDetails || levelDetails.live === true) {
// level not retrieved yet, or live playlist we need to (re)load it
var urlId = level.urlId;
hls.trigger(Event.LEVEL_LOADING, {url: level.url[urlId], level: newLevel, id: urlId});
}
} else {
// invalid level id given, trigger error
hls.trigger(Event.ERROR, {type : ErrorTypes.OTHER_ERROR, details: ErrorDetails.LEVEL_SWITCH_ERROR, level: newLevel, fatal: false, reason: 'invalid level idx'});
hls.trigger(Event.ERROR, {
type : ErrorTypes.OTHER_ERROR,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of the irregular whitespacing here since it breaks convention with the rest of the codebase (and if we decide to upgrade our linter to something like eslint, this may give us problems)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not. We use eslint. It's a common technic for big inline object representations.
JSHint does not have a warning for that structure. There is a big issue, it's when a line is longer than 80-120 characters. As you can see, in this review you already don't see all line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fair enough. Since it's a matter of opinion I'm fine with it as is

level.details = undefined;
} else {
// Switch-down if more renditions are available
if (this.manualLevelIndex === -1 && levelIndex !== 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the spec implies we should go to auto:

In the event of an index load failure on one stream, the client chooses the highest bandwidth alternate stream that the network connection supports.

Going to auto should also handle the case where we don't have a level below (but may have one above)

@NicolasSiver NicolasSiver merged commit f4a86a1 into video-dev:master Dec 4, 2017
@NicolasSiver NicolasSiver deleted the add-level-manifest-recovery branch December 4, 2017 22:37
@IvanRF
Copy link
Contributor

IvanRF commented Dec 6, 2017

@NicolasSiver when I saw this commit I thought it was the solution to my current issue, but apparently not. Does it work only for backup streams?

I have only Bad fragments (404 errors) for 360p & 480p but the player never reached the 720p level. It throws a fatal error.

I opened a new issue here: #1458

@ssreed ssreed mentioned this pull request Jan 31, 2018
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants