Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed-captions (WebVTT) subtitles won't show up for HLS live-stream #10969

Closed
dgewe opened this issue Feb 8, 2023 · 12 comments
Closed

Closed-captions (WebVTT) subtitles won't show up for HLS live-stream #10969

dgewe opened this issue Feb 8, 2023 · 12 comments
Assignees

Comments

@dgewe
Copy link

dgewe commented Feb 8, 2023

Hello!

I'm investigating a issue where a HLS live-stream closed-captions subtitle (WebVTT) track is selectable and available, but won't show any subs. It seems to be the same problem with both HLS + DAI and HLS without DAI.

Same stream with subtitles works OK for Shaka Player (Web).

I've sent a link to a HLS (DAI) live-stream privately by email where there are CC subtitles available, but only for certain content.

When trying to find the root cause I found something of interest where the Cues are filtered out in step 3. of this flow:

  1. subtitle.getCues(positionUs), presentationTimeUs)
    TextRenderer.java#L260

  2. Assertions.checkNotNull(subtitle).getCues(timeUs - subsampleOffsetUs)
    SubtitleOutputBuffer.java#L63

  3. if ((cueTimesUs[i * 2] <= timeUs) && (timeUs < cueTimesUs[i * 2 + 1]))
    WebvttSubtitle.java#L72

I noticed that in step 2. of the flow, the subsampleOffsetUs might be wrong.
Example:
timeUs = 1000084419695
subsampleOffsetUs = -1638780233473066

This leading to step 3. will give values like:
if ((cueTimesUs[i * 2] <= timeUs) && (timeUs < cueTimesUs[i * 2 + 1]))
if ((57244202000 <= 1639780317892761) && (1639780317892761< 57245002000))

My questions are:

  • What affects the value of subsampleOffsetUs and where is it set originally?
  • Do you have any ideas what could be the issue here?

Thanks for the help.

@icbaker
Copy link
Collaborator

icbaker commented Feb 9, 2023

I had a look at the provided live stream just now, but all the WebVTT files are empty (just contain the WEBVTT header). I guess the currently airing content has no subtitles?

Would you be able to download one of the .webvtt files from a time when the content does contain subtitles and attach it to an email? It would also be really helpful to download and attach the audio and video segments that cover the same time period. Please reply here when you've done this. I want to check how the timestamps are connected between the WebVTT file and the audio/video media. The HLS spec recommends every WebVTT segment have an X-TIMESTAMP-MAP metadata header at the beginning, but I don't see that in your (empty) files.

If you can provide a stream where this repros 100% of the time then I can investigate further. I'm not able to investigate on a stream where the content only sometimes has subtitles.

@dgewe
Copy link
Author

dgewe commented Feb 16, 2023

I had a look at the provided live stream just now, but all the WebVTT files are empty (just contain the WEBVTT header). I guess the currently airing content has no subtitles?

Would you be able to download one of the .webvtt files from a time when the content does contain subtitles and attach it to an email? It would also be really helpful to download and attach the audio and video segments that cover the same time period. Please reply here when you've done this. I want to check how the timestamps are connected between the WebVTT file and the audio/video media. The HLS spec recommends every WebVTT segment have an X-TIMESTAMP-MAP metadata header at the beginning, but I don't see that in your (empty) files.

If you can provide a stream where this repros 100% of the time then I can investigate further. I'm not able to investigate on a stream where the content only sometimes has subtitles.

@icbaker
Thanks for the followup, I've now sent a reply in my previous email for this issue.

  • HLS stream with CC subtitles all the time
    • Only scenario I can think of where there may be no subtitles is if there is a commercial break.
  • DASH stream for comparison (CC subtitles works OK for DASH)
  • Subtitle, Audio, and Video segments

I'm using this HLS player online https://www.hlsplayer.net/ and there I can see that the CC subtitles works when selected.

@icbaker
Copy link
Collaborator

icbaker commented Feb 21, 2023

Thanks, looking at the audio and video segments you sent through I see that they both have a tfdt box with baseMediaDecodeTime. In the audio segment this is set to 150888486661000 and video segment this is set to 150888486660160

Assuming this is in a standard 90kHz clock, that gives us:

  • audio = 150888486661000 / 90,000 = 1676538740.68 seconds
  • video = 150888486660160 / 90000 = 1676538740.67 seconds

i.e. they're very similar (100ms different to each other), and it looks suspiciously like a unix timestamp:

$ date -d @1676538740
Thu Feb 16 09:12:20 AM GMT 2023

ISO 14496-12:2012 defines baseMediaDecodeTime as:

an integer equal to the sum of the decode durations of all earlier samples in
the media, expressed in the media's timescale. It does not include the samples added in the enclosing
track fragment.

So this suggests that this stream started on 1st Jan 1970 (which I would argue seems a little unlikely :)).

Meanwhile the WebVTT file you provided has cues with timestamps in the range of 19:23:59.463 (i.e. 19 hours and 23 mins from the start of the stream).

The X-TIMESTAMP-MAP header is one way to align these timescales, and it is present in the provided WebVTT segment, but it just does the default of aligning zero with zero:

X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:0

That means the WebVTT timestamps are way in the past compared to the audio and video segments (over 53 years ago), which might explain why they're not being shown.

What happens if you keep the audio/video timestamps the same, but you change the WebVTT timestamps to be 'current' (i.e. something like (for 2023-02-21 15:19 UTC when I'm writing this) 465832:19:32.00)?

@dgewe
Copy link
Author

dgewe commented Mar 1, 2023

@icbaker
Thanks for the update and the explanation!

Modifying the timestamps like you suggested is unfortunately not an alternative right now.
Even if I would get that suggestion working in ExoPlayer, it will not be a viable solution that I can use. Because it will potentially affect other platforms where the same stream already works OK.

Questions:

  • Is it verified in ExoPlayer that WebVTT (or any other format) subtitles should work OK on HLS live streams? I was unsuccessful in finding other streams to test with, do you perhaps know any good example streams? (HLS+LIVE+SUBS)
  • Depending on the answer above, do you think that this issue would be suitable for moving to a bug report now? Considering that the type of stream that I sent you works OK using other popular players.
  • Any other suggestions on how to proceed with this issue?

@icbaker
Copy link
Collaborator

icbaker commented Mar 1, 2023

  • Is it verified in ExoPlayer that WebVTT (or any other format) subtitles should work OK on HLS live streams? I was unsuccessful in finding other streams to test with, do you perhaps know any good example streams? (HLS+LIVE+SUBS)

We haven't received any reports of valid HLS live streams having problems with WebVTT subtitles. I'm afraid I don't have an open example stream to share with though.

  • Depending on the answer above, do you think that this issue would be suitable for moving to a bug report now? Considering that the type of stream that I sent you works OK using other popular players.

No, I think the stream you've provided is not valid due to the timestamp mismatches I've described above - so it's reasonable for ExoPlayer not to play it correctly. The fact that other players are able to work around this invalidity is not evidence that the stream is valid.

  • Any other suggestions on how to proceed with this issue?

I'd suggest fixing the media so the timestamps are aligned between the WebVTT files and the audio/video segments.

@ehrlund
Copy link

ehrlund commented Mar 1, 2023

Hi,

I'm working togheter with @dgewe .

https://www.rfc-editor.org/rfc/rfc8216#section-3.5 last section states:
"
When synchronizing WebVTT with PES timestamps, clients SHOULD account
for cases where the 33-bit PES timestamps have wrapped and the WebVTT
cue times have not.
"

The values that our origin puts into the webvtt files are calculated like this.

Using a fresh example gives me:
PTS (also the same value as baseMediaDecodeTime)= 150991810346560
2^33 = 8589934592

Given the above the calculations should then be:

150991810346560 mod 8589934592 = 6530022976 / 90000 = 72555,810844444444444
Converting this to hh:mm:ss is 20:09:15

The webvtt subtitle looks like this:
cat s0_F316822.webvtt
WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:0

20:09:15.810 --> 20:09:17.010
subs

20:09:17.010 --> 20:09:19.650
subs

Is this way for timing subtitles unsupported by ExoPlayer ?

(edit: removing actual subtitle text)

@icbaker
Copy link
Collaborator

icbaker commented Mar 2, 2023

The RFC you quoted says (emphasis mine) "WebVTT cue times have not [wrapped]"

But in the working lower down you are deriving the WebVTT cue time from the wrapped PES timestamp. This doesn't match my interpretation of the RFC.

There's no need to wrap a WebVTT cue timestamp, because it's just a string - it can hold a theoretically arbitrarily large number of hours. ExoPlayer does actually impose a practical limit here, because we parse it into a signed 64-bit long, so the max value we can support is 2^63-1 hours ~= 10^15 years (several orders of magnitude larger than the current age of the universe).

This is why I don't think you should be deriving the WebVTT timestamps from the post-wrapped media time, and instead they should match the pre-wrapped time.

@ehrlund
Copy link

ehrlund commented Mar 2, 2023

You are correct, the cue times doesn't wrap, however they mark an "offset" from when to be displayed.

The origin should keep the "offset" between PES and WebVTT "strict" and my interpretation of the quoted section is that the client SHOULD cover a sort of misbehaving origin that does not properly keep the offset between PES and WebVTT cue times strict (i.e. the PES header has rolled over, but the cue times have just continued)

I should have mentioned the reason for having it like this and that is that the same WebVTT files are used for HLS TS, HLS CMAF and DASH streams where in particular the MPEG-2 PES header is constantly rolling over. This is to improve the cache hit ratio.

@icbaker
Copy link
Collaborator

icbaker commented Mar 3, 2023

You are correct, the cue times doesn't wrap, however they mark an "offset" from when to be displayed.

I'm not sure I understand this bit - don't WebVTT cue timestamps represent a time since the 'start of the stream' (an offset if you like)? Since this stream declares it starts on 1st Jan 1970 (aiui), the WebVTT cue times should be equal to the time since that date.


I should have mentioned the reason for having it like this and that is that the same WebVTT files are used for HLS TS, HLS CMAF and DASH streams where in particular the MPEG-2 PES header is constantly rolling over. This is to improve the cache hit ratio.

HLS + WebVTT is designed to handle this constant roll-over, that's what the spec section you linked is about - and so in all the HLS cases I think the WebVTT timestamps should not be wrapped.

For DASH, my understanding is that timestamps in standalone WebVTT files need to be relative to the start of the period: https://dashif-documents.azurewebsites.net/DASH-IF-IOP/master/DASH-IF-IOP.html#standalone-text-timing There's no mention of expecting WebVTT timestamps to wrap either.

@ehrlund
Copy link

ehrlund commented Mar 9, 2023

Sorry for a delayed answer.

I'm not sure I understand this bit - don't WebVTT cue timestamps represent a time since the 'start of the stream' (an offset if you like)? Since this stream declares it starts on 1st Jan 1970 (aiui), the WebVTT cue times should be equal to the time since that date.

Isn't that a "Dash interpretation". As far as I know, there is no "tracker" in MPEG-2 that tracks the number of times the PES (the clock) header has wrapped, there by its not possible to deduce the exact start time of the stream (the time "0" can mean anything that has an even modulus)?

It gets a bit more complicated when introducing HLS CMAF were you have some extra timing information in the fmp4 containers (like the one you mentioned "baseMediaDecodeTime"). However I haven't found any addendum or similar that specifies that the players should use the extra timing info in the fmp4:s or that the mpeg2 behaviour is not valid anymore? As mentioned previously, it works in Apples AVPlayer (iOS/tvOS/Safari) and Shaka

For DASH, my understanding is that timestamps in standalone WebVTT files need to be relative to the start of the period: https://dashif-documents.azurewebsites.net/DASH-IF-IOP/master/DASH-IF-IOP.html#standalone-text-timing There's no mention of expecting WebVTT timestamps to wrap either.

You are right here, I hadn't read that section of the DASH standard so we won't be able to reuse it for DASH however we still want to use it for HLS TS and HLS CMAF.

@dgewe
Copy link
Author

dgewe commented Apr 13, 2023

@icbaker

Hi again,

We're still not getting good results and are stuck on this issue, I would again appreciate any help to nail it down.

Link to a test stream has been sent by e-mail.

We're now running WebVTT timestamps that are not wrapped and they currently look like this: 467051:23:34.187(1681385014187000)

I'm still having problems making sense of this, specifically on this line:

return Assertions.checkNotNull(subtitle).getCues(timeUs - subsampleOffsetUs);

Here I can see that the values are: timeUs = 1000055988411 , subsampleOffsetUs = -3320095746037833 and (timeUs -subsampleOffsetUs) = 3321095802026244

Questions:

  1. Is the timeUs value as expected during HLS + LIVE? it seems that it uses the time from when I started the stream. Which is the same approach as when playing HLS + VOD where subtitles works OK.
    For me it feels like no matter how we solve the WebVTT timestamps, the timeUs might still report a bad value.
    Here is a example from when playing a DASH live stream with ttml subs:
    timeUs = 1682386201504700
    subsampleOffsetUs = 1000000000000

Also, we dug into the code a bit more

TimestampAdjuster.usToWrappedPts(firstCueTimeUs + tsTimestampUs - vttTimestampUs));
and we can see that Exoplayer tries to recalculate cue times in order to match wrapped pts:es. However this will also fail to match for the reason above.

  1. Do you have any idea what the issue could be?

  2. Do you know any workaround that I can make (hard coded if needed) to solve this issue?

Thanks for any help.

@dgewe
Copy link
Author

dgewe commented May 8, 2023

@icbaker

After investigating this issue even more, it seems that there is no support for non-wrapped webvtt subs for HLS live streams (cmaf).

I got around this issue by changing the below code in the WebvttExtractor.

long sampleTimeUs =
   timestampAdjuster.adjustTsTimestamp(
       TimestampAdjuster.usToWrappedPts(firstCueTimeUs + tsTimestampUs - vttTimestampUs));

to

long sampleTimeUs = timestampAdjuster.adjustSampleTimestamp(firstCueTimeUs + tsTimestampUs - vttTimestampUs);

@dgewe dgewe closed this as completed May 8, 2023
@google google locked and limited conversation to collaborators Jul 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants