Skip to content
This repository has been archived by the owner on Feb 28, 2022. It is now read-only.

fix(html pipe): Sanitize generated markdown to avoid XSS attacks #263

Merged
merged 17 commits into from
May 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/adr/heading-identifiers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
Automatic Headings Identifier Generation
========================================

|||
|-|-
|Status | :green_book:`ACCEPTED`
|Issue | [#26], [#253]

> TL;DR: a short Y-statement summarizing the decision

In the context of generating heading identifiers for easier URL targeting,
facing the concern that some of them could end up clobbering the DOM
we decided to sanitize the offending identifiers
to achieve proper protection of the DOM,
accepting that in limited use cases where the autor would need to expose a heading that clobbers, that heading would just not get an identifier.

Context
-------

> Describe the detailed context and problem being addressed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line break missing

Suggested change
> Describe the detailed context and problem being addressed
> Describe the detailed context and problem being addressed

Automatic heading identifier generation is extremely useful for authors so they can properly share targeted links to the content in their pages.

Unfortunately, HTML element identifiers are usually exposed on the DOM `document` element for legacy reasons. In some edge cases, the identifier can then end up colliding with existing DOM properties (i.e. `name`, `children`, `location`, etc.) Most modern browsers will protect the DOM API and set these properties as read-only, but some browsers are known to let the properties being overridden. This can become an attack vector for XSS attacks.

Decision
--------

In order to avoid DOM clobbering and reduce the possibility of an XSS attack through that vector, the decision is being made to sanitize the offending heading identifiers by just not exposing them in the DOM. The DOMPurify library that is used to sanitize the whole DOM will automatically remove all `id` properties on the headings that would cause a collision on the DOM properties.

Consequences
------------

All headings will have an automatically generated identifier in the DOM, except those that would clobber the DOM.
The latter will just be output without the `id` attribute.

If the author still needs a proper identifier being exposed in the DOM for easier navigation (which would be the case on API documentation pages), the proposed solutions are:
- Adding a prefix to the heading (i.e. `Event: OnClick`, `Property: Name`)
- Use a custom matcher function that injects a prefix in the identifier in the HTML (i.e. `<h3 id="event-onclick">OnClick</h3>`, `<h3 id="property-name">Name</h3>`)
- Use a custom JavaScript function that generates missing identifiers on headings clientside
8 changes: 8 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
"ajv": "^6.6.2",
"callsites": "^3.0.0",
"clone": "^2.1.2",
"dompurify": "^1.0.10",
"fs-extra": "^8.0.0",
"github-slugger": "^1.2.1",
"hast-util-select": "^3.0.1",
Expand Down
2 changes: 2 additions & 0 deletions src/defaults/html.pipe.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ const tohast = require('../html/html-to-hast');
const tohtml = require('../html/stringify-hast');
const addHeaders = require('../html/add-headers');
const timing = require('../utils/timing');
const sanitize = require('../html/sanitize');

/* eslint newline-per-chained-call: off */

Expand Down Expand Up @@ -63,6 +64,7 @@ const htmlpipe = (cont, payload, action) => {
.before(selecttest)
.before(html).expose('html')
.before(responsive)
.before(sanitize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we modified the entire pipeline (including the HTL engine) to be based on JSDOM (see #337), should be move this after the once() step?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because this would make it impossible to have unsanitized output, which would severely reduce the power and expressiveness of the templates. As we generally assume developers know what they want, we shouldn't do this.

.once(cont)
.after(type('text/html'))
.after(cache).when(uncached)
Expand Down
75 changes: 75 additions & 0 deletions src/html/sanitize.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/*
* Copyright 2019 Adobe. All rights reserved.
* This file is licensed to you under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License. You may obtain a copy
* of the License at http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under
* the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR REPRESENTATIONS
* OF ANY KIND, either express or implied. See the License for the specific language
* governing permissions and limitations under the License.
*/
const createDOMPurify = require('dompurify');
const { JSDOM } = require('jsdom');

const helixSanitizationConfig = {
// Allowing all ESI tags, see: https://www.w3.org/TR/esi-lang
ADD_TAGS: [
'esi:try',
'esi:attempt',
'esi:except',

'esi:choose',
'esi:when',
'esi:otherwise',

'esi:include',
'esi:inline',
'esi:remove',

'esi:vars',
'esi:comment',
],
RETURN_DOM: true,
};

const CUSTOM_NAME_REGEX = /^\w+-\w+$/;

/**
* Allow custom elements to be retained by the sanitization.
*
* @param {Object} DOMPurify the DOMPurify instance
*/
function allowCustomElements(DOMPurify) {
DOMPurify.addHook('uponSanitizeElement', (node, data) => {
if (node.nodeName && node.nodeName.match(CUSTOM_NAME_REGEX)) {
data.allowedTags[data.tagName] = true; // eslint-disable-line no-param-reassign
}
});
}

/**
* Allow custom attributes to be retained by the sanitization.
*
* @param {Object} DOMPurify the DOMPurify instance
*/
function allowCustomAttributes(DOMPurify) {
DOMPurify.addHook('uponSanitizeAttribute', (node, data) => {
if (data.attrName && data.attrName.match(CUSTOM_NAME_REGEX)) {
data.allowedAttributes[data.attrName] = true; // eslint-disable-line no-param-reassign
}
});
}

function sanitize({ content }, { logger }) {
logger.log('debug', 'Sanitizing content body to avoid XSS injections.');

const globalContext = (new JSDOM('')).window;
const DOMPurify = createDOMPurify(globalContext);
allowCustomElements(DOMPurify);
allowCustomAttributes(DOMPurify);
const sanitizedBody = DOMPurify.sanitize(content.document.body, helixSanitizationConfig);
content.document.body = sanitizedBody;
}

module.exports = sanitize;
2 changes: 1 addition & 1 deletion src/utils/heading-handler.js
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class HeadingHandler {
// Inject the id after transformation
const n = Object.assign({}, node);
const el = fallback(h, n);
el.properties.id = headingIdentifier;
el.properties.id = el.properties.id || headingIdentifier;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keep existing identifiers if defined by the author (not really possible in markdown for now, but if we use asciidoc as a parser at some point, then it would be supported

return el;
};
}
Expand Down
33 changes: 22 additions & 11 deletions test/testEmbedHandler.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
*/
/* eslint-env mocha */
const assert = require('assert');
const { Logger } = require('@adobe/helix-shared');
const { Logger, dom: { assertEquivalentNode } } = require('@adobe/helix-shared');
const { JSDOM } = require('jsdom');
const embed = require('../src/utils/embed-handler');
const { pipe } = require('../src/defaults/html.pipe.js');
const coerce = require('../src/utils/coerce-secrets');
Expand Down Expand Up @@ -121,11 +122,16 @@ https://www.youtube.com/watch?v=KOxbO0EI4MA

assert.equal(result.response.status, 201);
assert.equal(result.response.headers['Content-Type'], 'text/html');
assert.equal(result.response.body, `<p>Hello World
Here comes an embed.</p>
<esi:include src="https://example-embed-service.com/https://www.youtube.com/watch?v=KOxbO0EI4MA"></esi:include>
<esi:remove><p><a href="https://www.youtube.com/watch?v=KOxbO0EI4MA">https://www.youtube.com/watch?v=KOxbO0EI4MA</a></p></esi:remove>
<p><img src="easy.png" alt="Easy!" srcset="easy.png?width=480&amp;auto=webp 480w, easy.png?width=1384&amp;auto=webp 1384w, easy.png?width=2288&amp;auto=webp 2288w, easy.png?width=3192&amp;auto=webp 3192w, easy.png?width=4096&amp;auto=webp 4096w" sizes="100vw"></p>`);
assertEquivalentNode(
new JSDOM(result.response.body).window.document.body,
new JSDOM(`
<p>Hello World
Here comes an embed.</p>
<esi:include src="https://example-embed-service.com/https://www.youtube.com/watch?v=KOxbO0EI4MA"></esi:include>
<esi:remove><p><a href="https://www.youtube.com/watch?v=KOxbO0EI4MA">https://www.youtube.com/watch?v=KOxbO0EI4MA</a></p></esi:remove>
<p><img src="easy.png" alt="Easy!" srcset="easy.png?width=480&amp;auto=webp 480w, easy.png?width=1384&amp;auto=webp 1384w, easy.png?width=2288&amp;auto=webp 2288w, easy.png?width=3192&amp;auto=webp 3192w, easy.png?width=4096&amp;auto=webp 4096w" sizes="100vw"></p>
`).window.document.body,
);
});

it('html.pipe processes internal embeds', async () => {
Expand Down Expand Up @@ -154,10 +160,15 @@ Here comes an embed.

assert.equal(result.response.status, 201);
assert.equal(result.response.headers['Content-Type'], 'text/html');
assert.equal(result.response.body, `<p>Hello World
Here comes an embed.</p>
<esi:include src="/test/foo.embed.html"></esi:include>
<esi:remove><p>./foo.md</p></esi:remove>
<p><img src="easy.png" alt="Easy!" srcset="easy.png?width=480&amp;auto=webp 480w, easy.png?width=1384&amp;auto=webp 1384w, easy.png?width=2288&amp;auto=webp 2288w, easy.png?width=3192&amp;auto=webp 3192w, easy.png?width=4096&amp;auto=webp 4096w" sizes="100vw"></p>`);
assertEquivalentNode(
new JSDOM(result.response.body).window.document.body,
new JSDOM(`
<p>Hello World
Here comes an embed.</p>
<esi:include src="/test/foo.embed.html"></esi:include>
<esi:remove><p>./foo.md</p></esi:remove>
<p><img src="easy.png" alt="Easy!" srcset="easy.png?width=480&amp;auto=webp 480w, easy.png?width=1384&amp;auto=webp 1384w, easy.png?width=2288&amp;auto=webp 2288w, easy.png?width=3192&amp;auto=webp 3192w, easy.png?width=4096&amp;auto=webp 4096w" sizes="100vw"></p>
`).window.document.body,
);
});
});
37 changes: 36 additions & 1 deletion test/testHTML.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
*/
/* eslint-env mocha */
const assert = require('assert');
const { Logger } = require('@adobe/helix-shared');
const { Logger, dom: { assertEquivalentNode } } = require('@adobe/helix-shared');
const { JSDOM } = require('jsdom');
const fs = require('fs-extra');
const path = require('path');
const NodeHttpAdapter = require('@pollyjs/adapter-node-http');
Expand Down Expand Up @@ -688,4 +689,38 @@ ${context.content.document.body.innerHTML}`,
.filter(e => !!e);
assert.notEqual(found.length, 0);
});

it('html.pipe sanitizes author-generated content, but not developer-generated code', async () => {
const result = await pipe(
(context) => {
context.response = {
status: 200,
body: `
${context.content.document.body.innerHTML}
<a href="javascript:alert('XSS')">Baz</a>
`,
};
},
{
request: crequest,
content: {
body: `
# Foo

[Bar](javascript:alert('XSS'))
`,
},
},
{
request: { params },
secrets,
logger,
},
);
assert.equal(200, result.response.status);
assertEquivalentNode(
new JSDOM(result.response.body).window.document.body,
new JSDOM('<h1 id="foo">Foo</h1><p><a>Bar</a></p><a href="javascript:alert(\'XSS\')">Baz</a>').window.document.body,
);
});
});
52 changes: 50 additions & 2 deletions test/testHTMLFromMarkdown.js
Original file line number Diff line number Diff line change
Expand Up @@ -222,11 +222,11 @@ describe('Testing Markdown conversion', () => {
# Foo

<form>
<input type="text" name="name"><label for="name">Name</label>
<input type="text" name="fieldName"><label for="fieldName">Name</label>
</form>
`, `
<h1 id="foo">Foo</h1>
<form><input type="text" name="name"><label for="name">Name</label></form>
<form><input type="text" name="fieldName"><label for="fieldName">Name</label></form>
`);
});

Expand Down Expand Up @@ -266,4 +266,52 @@ describe('Testing Markdown conversion', () => {
</table>
`);
});

it('XSS escape href attribute on links', async () => {
await assertMd(`
[Foo](javascript://%0Dalert('XSS!'))
[Bar](javascript:alert('XSS!'))
`, `
<p>
<a>Foo</a>
<a>Bar</a>
</p>
`);
});

it('XSS escape href in images', async () => {
await assertMd(`
![Foo](javascript://%0Dalert('XSS!'))
![Bar](javascript:alert('XSS!'))
`, `
<p>
<img alt="Foo">
<img alt="Bar">
</p>
`);
});

it('XSS escape DOM clobbering attributes', async () => {
await assertMd(`
# location
<a name="anchors">Foo</a>
`, `
<h1 id="location">location</h1>
<p><a>Foo</a></p>
`);
});

it('Accept custom elements and attributes', async () => {
await assertMd(`
# Foo
Bar
<baz-qux corge-grault="garply">Waldo</baz-qux>
`, `
<h1 id="foo">Foo</h1>
<p>Bar</p>
<p>
<baz-qux corge-grault="garply">Waldo</baz-qux>
</p>
`);
});
});