Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAPT 2023-07-31 / 2024-11-07 > 2025-01-07 #59

Closed
nigelmegitt opened this issue Aug 1, 2023 · 16 comments
Closed

DAPT 2023-07-31 / 2024-11-07 > 2025-01-07 #59

nigelmegitt opened this issue Aug 1, 2023 · 16 comments
Assignees
Labels
FPWD First Public Working Draft published. REVIEW REQUESTED

Comments

@nigelmegitt
Copy link

We prefers groups to run a self-review around the time of FPWD. See https://w3ctag.github.io/security-questionnaire/.

If you still want us to review your spec, please provide the information below.

In the issue title above add the document name followed by the date of this request.

Other comments:

We have built, actually or conceptually, on previous privacy reviews for TTML2 and IMSC to inform this.

@nigelmegitt nigelmegitt added FPWD First Public Working Draft published. pending This issue needs to get a reviewer assigned to it REVIEW REQUESTED labels Aug 1, 2023
@nigelmegitt
Copy link
Author

Hey there, would appreciate an update on the likely timeline for this please, on behalf of TTWG.

@himorin himorin changed the title DAPT 2023-07-31 DAPT 2023-07-31 / 2024-11-07 > 2025-01-07 Dec 25, 2024
@himorin
Copy link

himorin commented Dec 25, 2024

@simoneonofri WG is waiting for your comment on security consideration section. Please give us any comment if you find.

@himorin
Copy link

himorin commented Jan 20, 2025

hi @simoneonofri , may I ask about current status of this review request?

@simoneonofri
Copy link
Contributor

simoneonofri commented Jan 29, 2025

Hello @himorin, @nigelmegitt,

First of all, thank you for your patience!

I've read the questionnaire, and considering it's a file format, with @innotommy, we've come up with a Threat Model for File formats:

  • PLS: Parsing/Loading/Serializing
  • CD: Compression/Decompression
  • EEC: Embed Executable Code (e.g., scripts)
  • LER: Links and external resources
  • MM: Metadata manipulation
  • DI: Data Integrity

You have already identified an issue related to embedding javascript into data: in the Questionnaire (EEC) and referencing the TTML2 Security and Privacy Section, so "importing" their Threat Model, so two questions (if you want to discuss this with an issue in your report):

  • A specific one: is the data: issue related exclusively to DAPT or TTML2?
  • A generic one: scrolling through the model above, are there items not considered? For example, the integrity?

Let me know if you prefer to discuss here or in a dedicated issue in your repo.

Thank you,

Simone

@nigelmegitt
Copy link
Author

Hi @simoneonofri thank you for the review.

I am not sure what you mean about "embedding javascript into data:" - the point in the questionnaire (question 7) is that if implementations are written in javascript then they are likely to use underlying platform functionality to dereference any data: URLs into usable resources.

There is nothing in TTML2 or DAPT that supports execution of any javascript within the payload of data: URLs. Although TTML2 supports font resources in data: URLs (and font resources contain specialised code blocks), there is no requirement for support of those resources in DAPT.

I would suggest that DAPT's dependency on XML, the payload syntax constraints in DAPT and TTML, and the validation semantics of both, are adequate to conclude that data integrity has been considered.

@cconcolato
Copy link

My attempt at responding to @simoneonofri's questions:

is the data: issue related exclusively to DAPT or TTML2?

The data: feature is already part of TTML2. DAPT is just making use of it in the ways defined by TTML2.

A generic one: scrolling through the model above, are there items not considered? For example, the integrity?

PLS: Parsing/Loading/Serializing

DAPT is based on TTML2, which is based on XML, so the security concerns regarding PLS are those of XML languages.

CD: Compression/Decompression

I don't know if we should consider base64 encoding as a CD mechanism. In any case, base64 support is a feature of TTML2, nothing new in DAPT. Generally, I would say there is no specific CD consideration in DAPT. There could be CD considerations in the transport layer, like HTTP Transfer encoding but that is not specific to DAPT.

EEC: Embed Executable Code (e.g., scripts)

DAPT (and TTML2) does not allow embedding of executable code.

LER: Links and external resources

DAPT reuses the features of TTML2 that allow linking to external resources. It does not define any new one. The only use of TTML2 linking feature is for audio as indicated in the current Privacy section of the spec.

MM: Metadata manipulation

Can you elaborate on what this means?

DI: Data Integrity

Can you elaborate on what this means?

@himorin
Copy link

himorin commented Jan 30, 2025

@simoneonofri thank you for your review and comments, I believe all questions are answered, and please let us know if you are satisfied or not. In addition to @nigelmegitt 's comment about data:, <data> of DAPT refers to TTML one, which is defined as <uri> to be resolved as xsd:anyURI, and its use is strictly limited as Data.class based on XML data model. Also in security and privacy consideration in TTML2 has a note that no script language in this area.

For 6 items from Threat Model, thank @cconcolato for giving analysis of DAPT, and it seems these items are better to be integrated into self-review material. I suppose it's still early phase for rebooted security group, but WGs would be happy to hear about any plan to update questionnaire with recent security considerations.

@simoneonofri
Copy link
Contributor

ere is nothing in TTML2 or DAPT that supports execution of any javascript within the payload of data: URLs. Although TTML2 supports font resources in data: URLs (and font resources contain specialised code blocks), there is no requirement for support of those resources in DAPT.

@nigelmegitt thank you for the clarification :)

@simoneonofri
Copy link
Contributor

My attempt at responding to @simoneonofri's questions:

Thank you :)

is the data: issue related exclusively to DAPT or TTML2?

The data: feature is already part of TTML2. DAPT is just making use of it in the ways defined by TTML2.

Ok, so in the questionnaire it was an example, and for that in the current considerations there is a reference to the security considerations of TTML2. It's a “way” (profile) of using that format?

In this sense, does it add components/functionality? In general, adding features could introduce vulnerability issues. That's why I ask. If there are new threats/vulnerabilities/attacks that can be made through DAPT, compared to TTML2.

A generic one: scrolling through the model above, are there items not considered? For example, the integrity?

PLS: Parsing/Loading/Serializing

DAPT is based on TTML2, which is based on XML, so the security concerns regarding PLS are those of XML languages.

Ok, great, thanks for the explanation, so as @nigelmegitt writes, it might make sense to reference that potentially there are various attacks/vulnerabilities/branches derived from the use of XML, same as you wrote for TTML2.

CD: Compression/Decompression

I don't know if we should consider base64 encoding as a CD mechanism. In any case, base64 support is a feature of TTML2, nothing new in DAPT. Generally, I would say there is no specific CD consideration in DAPT. There could be CD considerations in the transport layer, like HTTP Transfer encoding but that is not specific to DAPT.

I agree that base64 is ecoding. Thank you for the analysis, I think this

EEC: Embed Executable Code (e.g., scripts)

DAPT (and TTML2) does not allow embedding of executable code.

Ok!

LER: Links and external resources

DAPT reuses the features of TTML2 that allow linking to external resources. It does not define any new one. The only use of TTML2 linking feature is for audio as indicated in the current Privacy section of the spec.

Okay, thanks for the explanation. One direction is to include elements to verify the integrity of external files, such as the SRI. Does that make sense? Otherwise, it's out of scope. I've seen several vulnerabilities exploited by audio libraries, but they don't depend on DAPT.

MM: Metadata manipulation

Can you elaborate on what this means?

Of course, if there is metadata present (I would tell you internal to this format if we are talking about an XML base), if yes, what happens if it is manipulated (dates, timestamps, authors, etc.), for example, there are some attacks done by altering not so much the file itself but its metadata (to give an example, it was common practice to insert malicious payloads inside EXIF data, as well as to use them to get information that is not strictly necessary).

DI: Data Integrity

Can you elaborate on what this means?

Yes, then, if there's a mechanism to check whether or not when the file gets to the processor, it's been altered or not. Or if that issue is maybe delegated to another level (e.g., transport)

@simoneonofri
Copy link
Contributor

simoneonofri commented Jan 30, 2025

@simoneonofri thank you for your review and comments, I believe all questions are answered, and please let us know if you are satisfied or not. In addition to @nigelmegitt 's comment about data:, <data> of DAPT refers to TTML one, which is defined as <uri> to be resolved as xsd:anyURI, and its use is strictly limited as Data.class based on XML data model. Also in security and privacy consideration in TTML2 has a note that no script language in this area.

himorin thank you, yes I think with a few more messages (if we need to even make a short call), we can get a good result and then propose a standardized structure of the section.

For 6 items from Threat Model, thank @cconcolato for giving analysis of DAPT, and it seems these items are better to be integrated into self-review material. I suppose it's still early phase for rebooted security group, but WGs would be happy to hear about any plan to update questionnaire with recent security considerations.

You brought up a very good point. I've already opened an issue about that, obviously from the experience we're having from these early reviews. I would appreciate it if you would support the change. On facilitating the Threat Model, it's definitely a goal of SING.

@nigelmegitt
Copy link
Author

@simoneonofri in terms of this review, rather than potential future changes to the self-review or to the structure of security sections in data format specifications, are you satisfied that we can proceed, or do you have remaining unanswered questions or concerns?

@simoneonofri simoneonofri removed the pending This issue needs to get a reviewer assigned to it label Feb 4, 2025
@simoneonofri
Copy link
Contributor

@simoneonofri in terms of this review, rather than potential future changes to the self-review or to the structure of security sections in data format specifications, are you satisfied that we can proceed, or do you have remaining unanswered questions or concerns?

Hi @nigelmegitt I was waiting some feedback from @cconcolato on my explanation 😄.

For now, I think the main point is to refer to the Security Considerations of TTML2 (already done), and also - as you suggested - to refer also XML Security potential issues.

@cconcolato
Copy link

Ok, so in the questionnaire it was an example, and for that in the current considerations there is a reference to the security considerations of TTML2. It's a “way” (profile) of using that format?

Yes.

In this sense, does it add components/functionality? In general, adding features could introduce vulnerability issues. That's why I ask. If there are new threats/vulnerabilities/attacks that can be made through DAPT, compared to TTML2.

DAPT does not add functionality related to the data: feature of TTML2.

One direction is to include elements to verify the integrity of external files, such as the SRI. Does that make sense?

It does. I was not aware of this ~10 year old spec! It is an interesting idea to offer a standard way for authors to provide hashes/digests of embedded resources. I think the TTWG should consider that first for inclusion into TTML first and then DAPT could adopt it.

Of course, if there is metadata present (I would tell you internal to this format if we are talking about an XML base), if yes, what happens if it is manipulated (dates, timestamps, authors, etc.), for example, there are some attacks done by altering not so much the file itself but its metadata (to give an example, it was common practice to insert malicious payloads inside EXIF data, as well as to use them to get information that is not strictly necessary).

First of all, DAPT does not rely on external (e.g. transport level metadata). Now, there is no official data/metadata classification of the content of a DAPT document. If we had to, I would say, anything that is not text content would be metadata, such as timing, annotations, language information. Given that definition, manipulation of metadata would result in text content being processed at the wrong time, or wrongly (e.g. audio description being processed as dubbing or vice versa). You could, for example, make the characters say things in a different order. It would be hard to differentiate from badly authored content. I don't know if this can be considered as an attack. How do you define an attack?

Data Integrity
Yes, then, if there's a mechanism to check whether or not when the file gets to the processor, it's been altered or not. Or if that issue is maybe delegated to another level (e.g., transport)

Yes, I think DAPT would defer that to the transport level.

@nigelmegitt
Copy link
Author

One direction is to include elements to verify the integrity of external files, such as the SRI. Does that make sense?

It does. I was not aware of this ~10 year old spec! It is an interesting idea to offer a standard way for authors to provide hashes/digests of embedded resources. I think the TTWG should consider that first for inclusion into TTML first and then DAPT could adopt it.

This is very interesting, I was not aware of it either. I can foresee significant potential problems with requiring it in the context of DAPT. The subresources that would be relevant are audio files, most likely for audio descriptions; during the authoring and quality checking phases of content production it could be useful to replace those audio files, for example if a better recording is made. In that case, I would not like to be prescriptive about how the referencing DAPT document might need to be modified. On the other hand, it could be helpful to have the option of adding an integrity check when the content is considered complete, i.e. that no more work is required.

If the DAPT file is being distributed, it is likely that any URLs for external resources would need to be changed to take into account CDNs etc.

Player behaviour if an integrity check fails would be an interesting consideration.

I agree with Cyril that this could be useful and needs more consideration - I would certainly not like to delay the initial version of DAPT while we think about it, if I'm reading correctly that this would be a "nice to have" rather than an "essential" from a security point of view.

@himorin
Copy link

himorin commented Feb 12, 2025

@simoneonofri in terms of this review, rather than potential future changes to the self-review or to the structure of security sections in data format specifications, are you satisfied that we can proceed, or do you have remaining unanswered questions or concerns?

Hi @nigelmegitt I was waiting some feedback from @cconcolato on my explanation 😄.

For now, I think the main point is to refer to the Security Considerations of TTML2 (already done), and also - as you suggested - to refer also XML Security potential issues.

hi @simoneonofri , sorry to rush into, but does reply satisfy your concern from security point of view? I suppose all points except for verification are clarified, and integration of verification using standard approach could be considered as future extension but not a blocker for DAPT into CRS. If so, could you mark this review request as completed (and close this issue)?
Since DAPT itself does not define but just referring definition from TTML and we first should go into TTML instead of updating DAPT, if approach of integrating SRI (or some standard mechanism) into TTML specification is acceptable, let us file a tracker issue into TTML repository.

@simoneonofri
Copy link
Contributor

@himorin @nigelmegitt @cconcolato thank you for the conversation.

I've opened two issues related to our discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FPWD First Public Working Draft published. REVIEW REQUESTED
Projects
None yet
Development

No branches or pull requests

4 participants