-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSA/ASS subtitles - Overlapping start/end times and position tag is not handled #6595
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
The position stuff looks good, just a few small comments.
I've added a larger comment about the general approach to the handling of overlapping start/end times. It's an awkward bit of algorithm/logic to get right (and in the future if we need to implement it a third time we might try and abstract it into a common utility class that handles all the overlapping resolution etc.).
On the subject of testing:
I think it probably makes sense to test at the level of SsaDecoder (rather than directly testing SsaSubtitle for example). It's probably worth adding my "equal start/end times" case as one of your tests :) As well as nested subtitles too:
[3, 7] -> "A"
[4, 5] -> "B"
i++; | ||
} while (i != endTimeIndex); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but I'm not convinced this correctly handles multiple cues that have the same start or end time.
e.g.
[3, 5] -> "A"
[4, 5] -> "B"
We'll insert the first one, after which we have: cueTimesUs=[3, 5]
, startTimeIndex=0
& endTimeIndex=1
and cues=[["A"], []]
.
Then insert the second one: cueTimesUs=[3, 4, 5, 5]
, startTimeIndex=1
& endTimeIndex=3
and cues=[["A"], ["A", "B"], ["B"], []]
Whereas I think with this input we'd want these lists: cueTimesUs=[3, 4, 5]
and cues=[["A"], ["A", "B"], []]
It also took quite a lot of thought for me to follow that through, especially with the multiple mutations to the cue
list on each iteration of the outer loop.
Note that we have this same overlapping challenge in the webvtt package and we actually solve it in a slightly different way, by more lazily evaluating Subtitle#getCues
(every call to that iterates over all the subtitles we have) [1]. I chatted a bit to the team, and that seems unfortunately inefficient, so I think it makes sense to keep the logic here and not copy webvtt, but maybe we can correct it and make it a bit easier to follow.
My suggestion is to get rid of insertToCueTimes()
and do something more like (I haven't tested this, it might have other problems...):
- (binary?) search for
startTimeUs
incueTimesUs
- if
startTimeUs
is already there, then get the matching list (by index) fromcues
and addcue
to it. - else insert
startTimeUs
tocueTimesUs
and insert a new matching list tocues
(containing all the Cues fromindex - 1
pluscue
).
- if
- Walk through
cueTimesUs
, addingcue
to every entry matching entry incues
until you find a time that's either equal to or greater thanendTimeUs
(mostly your existing do/while loop)- On each step, store a reference to the matching list of cues before you add
cue
. (This reference should also store the list from theelse
in the first bullet beforecue
is added) - If the time you stopped on is equal to
endTimeUs
, then do nothing (the cues list already has the correct 'end' value, right?) - If it's greater, then insert a new cues list equal to the most recent list you stored at the top of this sub-section.
- On each step, store a reference to the matching list of cues before you add
[1]
ExoPlayer/library/core/src/main/java/com/google/android/exoplayer2/text/webvtt/WebvttSubtitle.java
Line 75 in 41b3fc1
public List<Cue> getCues(long timeUs) { |
@@ -226,4 +285,15 @@ public static long parseTimecodeUs(String timeString) { | |||
return timestampUs; | |||
} | |||
|
|||
@Nullable | |||
public static Pair<Float, Float> parsePosition(String line){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make more sense to use android.graphics.PointF here because it's a little less general/ambiguous than Pair<Float, Float>
Also avoids auto-boxing
@@ -98,6 +100,12 @@ protected Subtitle decode(byte[] bytes, int length, boolean reset) { | |||
private void parseHeader(ParsableByteArray data) { | |||
String currentLine; | |||
while ((currentLine = data.readLine()) != null) { | |||
if (currentLine.startsWith("PlayResX:")) { | |||
playResX = Integer.valueOf(currentLine.substring(9).trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a tiny bit clearer to use the string literal again instead of a 'magic' length (saves a future reader manually counting the number of characters in "PlayResX:" :))
playResX = Integer.valueOf(currentLine.substring("PlayResX:".length()).trim());
@@ -196,16 +204,67 @@ private void parseDialogueLine(String dialogueLine, List<Cue> cues, LongArray cu | |||
} | |||
} | |||
|
|||
// Parse \pos{x,y} attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd move this comment to the javadoc of parsePosition
@@ -50,6 +53,9 @@ | |||
private int formatEndIndex; | |||
private int formatTextIndex; | |||
|
|||
private int playResX; | |||
private int playResY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably safer to explicitly initialise these to an invalid value to clearly indicate 'unset', because 0 seems like a potentially genuine value we could see in a subtitle file? And then also update the comparison below on L216.
We have C.LENGTH_UNSET
which I think would work well.
https://github.com/google/ExoPlayer/blob/release-v2/library/core/src/main/java/com/google/android/exoplayer2/C.java
} | ||
|
||
@Override | ||
public List<Cue> getCues(long timeUs) { | ||
int index = Util.binarySearchFloor(cueTimesUs, timeUs, true, false); | ||
if (index == -1 || cues[index] == Cue.EMPTY) { | ||
if (index == -1 || cues.get(index).isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can get rid of the empty check, since we'll just return the empty list below anyway (right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right.
Project Default.xml
Outdated
<component name="InspectionProjectProfileManager"> | ||
<profile version="1.0"> | ||
<option name="myName" value="Project Default" /> | ||
</profile> | ||
</component> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted in commit fb2a702 :)
Applied the suggested changes (thank you!) except the one with same start or end time. I'll follow up on that tomorrow :) |
I've pushed the latest changes. Testing: Overlapping challenge:
Yes, the previous implementation resulted redundant cues and cueTimes in this case, visually it was fine though. I agree that we shouldn't add redundant things, and the algorithm you've described is more clear and bit more optimal (no need to search ahead where endTimeUs fits). I've tried to implement it following that approach (a lot of trial and error) but couldn't really succeed because of the corner cases (e.g. handling first cue, endTimeUs is greater than all of the times, same startTime/endTime, storing reference of cues - to later add it - means we need to inspect the next+1 cueTime, not just the next one etc.) so decided to go back to the first approach and just correct the same startTime/endTime problem. What do you think? |
Looks good, thanks! I'll work on getting this merged. |
Great, let me know if any further changes are needed. |
Just to keep you posted, I haven't forgotten about this :) It turns out it's a little tricky to merge this while also supporting the 'blank' end timecode behaviour currently in SsaDecoder (where the intention is that the line appears only until the next line...). I've chatted with the team, and we're likely to remove the blank end timecode 'feature' as it's not really supported by the spec afaict. Once that support is removed, I'll be able to merge this more easily. |
Sure, there's no hurry :) |
SSA spec allows the lines in any order, so they must all have an end time: http://moodub.free.fr/video/ass-specs.doc The Matroska write-up of SubRip assumes the end time is present: https://matroska.org/technical/specs/subtitles/srt.html This will massively simplify merging issue:#6595 PiperOrigin-RevId: 279926730
Hi guys, any news regarding this PR? |
Just merged it in to dev-v2 (with fairly significant changes, but the functionality originally proposed should all be there). |
Pull request for #6320.