Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEA608 Missing Whitespaces #3906

Closed
zsmatyas opened this issue Feb 26, 2018 · 9 comments
Closed

CEA608 Missing Whitespaces #3906

zsmatyas opened this issue Feb 26, 2018 · 9 comments
Assignees
Labels

Comments

@zsmatyas
Copy link
Contributor

zsmatyas commented Feb 26, 2018

I see lots of content having captions seemingly missing space characters around italic words (when Mid-Row command is used):
"He saidHELLOimmediately. (where the word HELLO is italic in the caption)

The CEA608 Standard seems to be not too clear about handling of Mid-Row Commands. Some parts mention adding a white space when processing a code, but it is quite ambiguous:

Part 7.4 Real-Time Scrolling Display:

"In Text Mode"... "The cursor moves automatically one column to the right after each character or Mid-Row Code received."

This seems to describe Text Mode only.

Part B.12 Backspacing:

"Backspace shall move the cursor one column to the left, erasing the character or Mid-Row Code occupying that location..."

This suggests that the Mid-Row code occupies the space for a character on the screen.

If we can rely on https://www.law.cornell.edu/cfr/text/47/79.101

"The cursor moves automatically one column to the right after each character or Mid-Row Code received."

So all in all, this function probably needs to add a whitespace to the text to handle Mid-Row Codes appropriately:

@AquilesCanta
Copy link
Contributor

I don't have full grasp of the 608 spec, so please correct me if I am wrong. As I understand it, our implementation only supports Caption Mode (Text mode is unsupported).

This suggests that the Mid-Row code occupies the space for a character on the screen.

I agree this is the case in Text mode. But I am not convinced that

this function probably needs to add a whitespace to the text

won't break other captions that assume Caption mode. It would be great to receive some insight from someone experienced in 608.

@zsmatyas
Copy link
Contributor Author

zsmatyas commented Feb 28, 2018

After having met this bug again and again, when live channels seems to miss a whitespace around single words in italic, I have checked the 608 spec multiple times. Unfortunately, I have only found implicit signs that the Mid-Row command also requires the whitespace to be added at the current location.

Sentences quoted above and:

C.2 Processing Color in a Monochrome-Display Decoder (Regulatory)
The decoder shall ignore the color selection designated in mid-row codes and non-indenting PACs, but it shall recognize the spacing and other attribute assignments these codes may designate.

C.8 Solid Spaces Added for Legibility (Regulatory/Preferred)
It is recommended as a preferred practice that Mid-Row codes and the FON command, as spacing attributes, be treated in the same manner as displayable characters in terms of adding solid spaces.

All these suggest that All Mid-Row codes have a "spacing attribute", but it is never explicitly described.

But the legal document titled "Closed caption decoder requirements for analog television receivers" I linked above clearly says multiple times:

Caption Mode.
(1)Roll-up.
(vi) The cursor moves automatically one column to the right after each character or Mid-Row Code received. A Backspace will move the cursor one column to the left, erasing the character or Mid-Row Code occupying that location.

This is repeated for all captioning modes available in 608.

(h)Character Attributes
(1)Transmission of Attributes.
(i) All Mid-Row Codes and the Flash On command are spacing attributes which appear in the display just as if a standard space (20h) had been received.

So, if we can accept the validity of the linked legal document - as a clarification of the older CEA608 standard created on July 1, 1993, we probably need to add that whitespace.

The CEA608 Wiki also links a form of the legal document:
https://www.gpo.gov/fdsys/pkg/CFR-2007-title47-vol1/pdf/CFR-2007-title47-vol1-sec15-119.pdf

@ojw28
Copy link
Contributor

ojw28 commented Jun 5, 2018

Should be fixed at the same time as #4321.

@zsmatyas
Copy link
Contributor Author

zsmatyas commented Jun 5, 2018

Note: this is also fixed in this pull request:
5a6bb78#diff-83f8c19437ea277fb176fdd4055095baR765

@ojw28
Copy link
Contributor

ojw28 commented Jun 7, 2018

Apple's 16x9 test stream doesn't seem to assume that midrow codes insert spaces. The stream includes captions that are fully italic. If we add space when we see a midrow code we end up starting each of these captions with a space, which I very much doubt is intended. There are no backspaces to undo the spaces, as far as I can see. Thoughts?

@ojw28
Copy link
Contributor

ojw28 commented Jun 7, 2018

Looks like dash.js inserts a space (Dash-Industry-Forum/dash.js#1906). Android's ClosedCaptionRenderer does too. Does this imply the Apple test stream is incorrect in not following the midrow codes with backspaces, or is there something else I'm missing?

@zsmatyas
Copy link
Contributor Author

zsmatyas commented Jun 7, 2018

The stream includes captions that are fully italic.

That is usually achieved by a Preamble code, not a mid-row code. You need to look for a "Sentence with a single word in italic."
I tested dozens of live channels. The providers (over-the-air in the Mountain View area) do not seem to add a whitespace after a mid-row code, so the text looks strange on live channels. Note: an additional whitespace would not cause any trouble.

But please, check the links I added above. Legal requirements for CEA608 say that:

The cursor moves automatically one column to the right after each character or Mid-Row Code received.

(iv) Thus, for example, if a red, italicized, underlined, flashing character is desired, the attributes must be received in the following order: a red Mid-Row or Preamble Address Code, an italics Mid-Row Code with underline bit, and the Flash On command. The character will then be preceded by three spaces (two if red was assigned via a Preamble Address Code).

Receipt of a Backspace will move the cursor one column to the left, erasing the character or Mid-Row Code occupying that location.

Does this imply the Apple test stream is incorrect

When did Apple follow any standards for the last time? in 1982? ;)

@ojw28
Copy link
Contributor

ojw28 commented Jun 7, 2018

Yes, I've looked at those already, thanks. I'm pretty sure inserting the space is the right thing to do.

This does imply the Apple test stream is incorrect, as far as I can tell. They seem to be using mid-row code rather than preamble code. If you have a chance it would be good if you could take a quick look at that stream to see if you agree, and/or whether we have some other bug. The unwanted space is inserted with all of the fixes in #4308, so if there is some other bug then it's not fixed as part of that change. Note that it's trivial to see the unwanted space simply by replacing it for another character :).

@ojw28
Copy link
Contributor

ojw28 commented Jun 14, 2018

My plan for this is to merge a one line change (equivalent to 5a6bb78#diff-83f8c19437ea277fb176fdd4055095baR765), so that we can close this issue ahead of fully merging all of that commit (which will take more work).

ojw28 added a commit that referenced this issue Jun 18, 2018
Issue: #3906

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=200526335
ojw28 added a commit that referenced this issue Jun 18, 2018
Issue: #3906

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=200526335
@ojw28 ojw28 closed this as completed Jun 18, 2018
@google google locked and limited conversation to collaborators Nov 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants