-
Notifications
You must be signed in to change notification settings - Fork 19
fix(html pipe): Sanitize generated markdown to avoid XSS attacks #263
Changes from all commits
8c55d0d
65430d4
e2d7963
4b83b7c
38dc108
71f2367
5d06d15
b33a5fa
e5ae87d
c8fb43c
bcbd1ff
247a6b9
85614e2
042a59f
6abb218
95f5899
2c8ea4a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Automatic Headings Identifier Generation | ||
======================================== | ||
|
||
||| | ||
|-|- | ||
|Status | :green_book:`ACCEPTED` | ||
|Issue | [#26], [#253] | ||
|
||
> TL;DR: a short Y-statement summarizing the decision | ||
|
||
In the context of generating heading identifiers for easier URL targeting, | ||
facing the concern that some of them could end up clobbering the DOM | ||
we decided to sanitize the offending identifiers | ||
to achieve proper protection of the DOM, | ||
accepting that in limited use cases where the autor would need to expose a heading that clobbers, that heading would just not get an identifier. | ||
|
||
Context | ||
------- | ||
|
||
> Describe the detailed context and problem being addressed | ||
Automatic heading identifier generation is extremely useful for authors so they can properly share targeted links to the content in their pages. | ||
|
||
Unfortunately, HTML element identifiers are usually exposed on the DOM `document` element for legacy reasons. In some edge cases, the identifier can then end up colliding with existing DOM properties (i.e. `name`, `children`, `location`, etc.) Most modern browsers will protect the DOM API and set these properties as read-only, but some browsers are known to let the properties being overridden. This can become an attack vector for XSS attacks. | ||
|
||
Decision | ||
-------- | ||
|
||
In order to avoid DOM clobbering and reduce the possibility of an XSS attack through that vector, the decision is being made to sanitize the offending heading identifiers by just not exposing them in the DOM. The DOMPurify library that is used to sanitize the whole DOM will automatically remove all `id` properties on the headings that would cause a collision on the DOM properties. | ||
|
||
Consequences | ||
------------ | ||
|
||
All headings will have an automatically generated identifier in the DOM, except those that would clobber the DOM. | ||
The latter will just be output without the `id` attribute. | ||
|
||
If the author still needs a proper identifier being exposed in the DOM for easier navigation (which would be the case on API documentation pages), the proposed solutions are: | ||
- Adding a prefix to the heading (i.e. `Event: OnClick`, `Property: Name`) | ||
- Use a custom matcher function that injects a prefix in the identifier in the HTML (i.e. `<h3 id="event-onclick">OnClick</h3>`, `<h3 id="property-name">Name</h3>`) | ||
- Use a custom JavaScript function that generates missing identifiers on headings clientside |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,7 @@ const tohast = require('../html/html-to-hast'); | |
const tohtml = require('../html/stringify-hast'); | ||
const addHeaders = require('../html/add-headers'); | ||
const timing = require('../utils/timing'); | ||
const sanitize = require('../html/sanitize'); | ||
|
||
/* eslint newline-per-chained-call: off */ | ||
|
||
|
@@ -63,6 +64,7 @@ const htmlpipe = (cont, payload, action) => { | |
.before(selecttest) | ||
.before(html).expose('html') | ||
.before(responsive) | ||
.before(sanitize) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. once we modified the entire pipeline (including the HTL engine) to be based on JSDOM (see #337), should be move this after the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, because this would make it impossible to have unsanitized output, which would severely reduce the power and expressiveness of the templates. As we generally assume developers know what they want, we shouldn't do this. |
||
.once(cont) | ||
.after(type('text/html')) | ||
.after(cache).when(uncached) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
/* | ||
* Copyright 2019 Adobe. All rights reserved. | ||
* This file is licensed to you under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. You may obtain a copy | ||
* of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software distributed under | ||
* the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR REPRESENTATIONS | ||
* OF ANY KIND, either express or implied. See the License for the specific language | ||
* governing permissions and limitations under the License. | ||
*/ | ||
const createDOMPurify = require('dompurify'); | ||
const { JSDOM } = require('jsdom'); | ||
|
||
const helixSanitizationConfig = { | ||
// Allowing all ESI tags, see: https://www.w3.org/TR/esi-lang | ||
ADD_TAGS: [ | ||
'esi:try', | ||
'esi:attempt', | ||
'esi:except', | ||
|
||
'esi:choose', | ||
'esi:when', | ||
'esi:otherwise', | ||
|
||
'esi:include', | ||
'esi:inline', | ||
'esi:remove', | ||
|
||
'esi:vars', | ||
'esi:comment', | ||
], | ||
RETURN_DOM: true, | ||
}; | ||
|
||
const CUSTOM_NAME_REGEX = /^\w+-\w+$/; | ||
|
||
/** | ||
* Allow custom elements to be retained by the sanitization. | ||
* | ||
* @param {Object} DOMPurify the DOMPurify instance | ||
*/ | ||
function allowCustomElements(DOMPurify) { | ||
DOMPurify.addHook('uponSanitizeElement', (node, data) => { | ||
if (node.nodeName && node.nodeName.match(CUSTOM_NAME_REGEX)) { | ||
data.allowedTags[data.tagName] = true; // eslint-disable-line no-param-reassign | ||
} | ||
}); | ||
} | ||
|
||
/** | ||
* Allow custom attributes to be retained by the sanitization. | ||
* | ||
* @param {Object} DOMPurify the DOMPurify instance | ||
*/ | ||
function allowCustomAttributes(DOMPurify) { | ||
DOMPurify.addHook('uponSanitizeAttribute', (node, data) => { | ||
if (data.attrName && data.attrName.match(CUSTOM_NAME_REGEX)) { | ||
data.allowedAttributes[data.attrName] = true; // eslint-disable-line no-param-reassign | ||
} | ||
}); | ||
} | ||
|
||
function sanitize({ content }, { logger }) { | ||
logger.log('debug', 'Sanitizing content body to avoid XSS injections.'); | ||
|
||
const globalContext = (new JSDOM('')).window; | ||
const DOMPurify = createDOMPurify(globalContext); | ||
allowCustomElements(DOMPurify); | ||
allowCustomAttributes(DOMPurify); | ||
const sanitizedBody = DOMPurify.sanitize(content.document.body, helixSanitizationConfig); | ||
content.document.body = sanitizedBody; | ||
} | ||
|
||
module.exports = sanitize; |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,7 +45,7 @@ class HeadingHandler { | |
// Inject the id after transformation | ||
const n = Object.assign({}, node); | ||
const el = fallback(h, n); | ||
el.properties.id = headingIdentifier; | ||
el.properties.id = el.properties.id || headingIdentifier; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just keep existing identifiers if defined by the author (not really possible in markdown for now, but if we use asciidoc as a parser at some point, then it would be supported |
||
return el; | ||
}; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line break missing