Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace sanitizer with DOMPurify #52

Closed
rofe opened this issue Apr 20, 2019 · 8 comments
Closed

Replace sanitizer with DOMPurify #52

rofe opened this issue Apr 20, 2019 · 8 comments
Assignees
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@rofe
Copy link
Contributor

rofe commented Apr 20, 2019

xss_api.js currently uses Caja-HTML-Sanitizer for filterHTML(). As suggested by @lkrapf here, DOMPurify would be a more proven alternative with a very active community (while Caja-HTML-Sanitizer only has 48 commits and was last updated 3 years ago).

I propose to replace it with DOMPurify.

@tripodsan
Copy link
Contributor

I think he meant to use it in the pipeline on the vdom. I don't think that we need to apply it again on the HTL output.
If we wanted to guard the output again after all, we should do it in the pipeline in order to support all kind of engines, not just htl.

@rofe
Copy link
Contributor Author

rofe commented Apr 20, 2019

I agree to sanitize the output in the pipeline (see also my comment in adobe/helix-pipeline#263 (comment)).

But I think we should still aim for reach maximum security in htlengine. Someone might use it independently and expect the output to be safe.

@trieloff trieloff added the enhancement New feature or request label Apr 23, 2019
@rofe rofe self-assigned this Apr 30, 2019
@tripodsan
Copy link
Contributor

I tested the performance with some larger files:

describe('Engine sanitizer Performance test', async () => {
  // todo: enable large tests once performance issues are addressed
  const TEST_FILES = ['simple2.html', '400kb.htm', '700kb.htm'];

  TEST_FILES.forEach((filename) => {
    it(`produces htl output for ${filename}`, async () => {
      const now = Date.now();
      const filePath = path.resolve(__dirname, 'templates', filename);
      const source = await fse.readFile(filePath, 'utf-8');
      await engine({
        source,
        // eslint-disable-next-line no-template-curly-in-string
      }, '${source @ context="html"}');
      console.log(`Time for ${filename}: ${Date.now()-now}ms`);
    }).timeout(30000);
  });
});

with dom purify:

Time for simple2.html: 598ms
Time for 400kb.htm: 923ms
Time for 700kb.htm: 1005ms

with original sanitizer:

Time for simple2.html: 78ms
Time for 400kb.htm: 64ms
Time for 700kb.htm: 104ms

Given the fact, that we output a lot of HTML through the html context, I'm reluctant to use dompurify here. can we find the edge cases, where the current sanitizer doesn't work?

@rofe
Copy link
Contributor Author

rofe commented May 9, 2019

I agree that performance is an important criterion. But in the one case where a customer gets hacked, it becomes moot :)

After a quick test using the examples from a08b0e7, I wasn't able to break our current sanitizer. But I'm not an expert security researcher by any means, and my concern regarding the dead codebase and community remains. The next new attack vector might not be covered...

@rofe
Copy link
Contributor Author

rofe commented May 13, 2019

So... how should we proceed with this issue :)

If the majority thinks the current sanitizer is safe enough, I suggest we close this issue and the corresponding PR...

@trieloff
Copy link
Contributor

I'd close it for now, but there is a slightly crazy idea I want to discuss with @tripodsan at the hackathon that could make it viable again (but this idea would break the rest of htlengine).

@rofe rofe closed this as completed May 13, 2019
@tripodsan
Copy link
Contributor

there is a slightly crazy idea I want to discuss with @tripodsan at the hackathon that could make it viable again (but this idea would break the rest of htlengine).

:-) DOM based HTL engine ?

@trieloff
Copy link
Contributor

:-) DOM based HTL engine ?

DOM based everything!

If we build HTLEngine the same way the JSX processor is built (by using h, or React.createElement or a compatible API) we'd get a proper DOM to work with in the pipeline (I'd change the downstream part of the HTML pipeline to work entirely on DOM instead of strings or HTAST) and we'd get a couple of features that make HTL in the browser more interesting such as support for Virtual DOM diffing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants