Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclosed HTML comments not being escaped with safe_mode='escape' #563

Closed
michaelkedar opened this issue Jan 3, 2024 · 3 comments · Fixed by #564
Closed

Unclosed HTML comments not being escaped with safe_mode='escape' #563

michaelkedar opened this issue Jan 3, 2024 · 3 comments · Fixed by #564
Labels

Comments

@michaelkedar
Copy link

michaelkedar commented Jan 3, 2024

Describe the bug
Converting a string containing an unclosed <!-- tag with safe_mode='escape' does not replace < with &lt;. This causes the whole html document to be commented out when attempting to render.

There's also something odd about how markdown is being rendered inside of comments

To Reproduce

>>> markdown2.markdown('*foo* <!-- *bar* -->', safe_mode='escape') # with closing tag, OK
'<p><em>foo</em> &lt;!-- *bar* --&gt;</p>\n'
>>> markdown2.markdown('*foo* <!-- *bar*', safe_mode='escape') # without closing tag, bad
'<p><em>foo</em> <!-- <em>bar</em></p>\n'

Expected behavior
<!-- should always be replaced with &lt;!--.
It's also inconsistent whether it puts <em> tags around the 'commented' section. I believe it should be replacing it in both cases:

>>> markdown2.markdown('*foo* <!-- *bar* -->', safe_mode='escape')
'<p><em>foo</em> &lt;!-- <em>bar</em> --&gt;</p>\n'
>>> markdown2.markdown('*foo* <!-- *bar*', safe_mode='escape')
'<p><em>foo</em> &lt;!-- <em>bar</em></p>\n'

Debug info
Version of library being used: 2.4.12

Any extras being used: None

michaelkedar added a commit to google/osv.dev that referenced this issue Jan 3, 2024
Unclosed `<!--` in the vulnerability details sections aren't escaped due
to a [problem with
markdown2](trentm/python-markdown2#563).

Temporarily manually escaped these.
@Crozzers
Copy link
Contributor

Crozzers commented Jan 3, 2024

Thanks for spotting this. I've opened a PR for the incomplete HTML comments.

It's also inconsistent whether it puts <em> tags around the 'commented' section. I believe it should be replacing it in both cases

The library currently regards the contents of any HTML tag as HTML and will not process it as markdown. We do have an extra called markdown-in-html that allows you to put markdown content inside HTML tags, but you need to add a markdown="1" attribute to the tag, which won't work for comments.

Looking on babelmark, most other MD processors leave the HTML comment intact, except for markdown-it which escapes the comment tag and then processes the contents.

It's feasible that we could escape the comment tag, sanitise the contents (to remove any commented out HTML) and then process it as markdown.

nicholasserra added a commit that referenced this issue Jan 3, 2024
…-mode

Fix incomplete comments in safe mode not being escaped (#563)
@michaelkedar
Copy link
Author

Thanks for fixing this!

The library currently regards the contents of any HTML tag as HTML and will not process it as markdown.

Personally, I find having this behaviour when safe_mode='escape' unusual, since setting it is supposed to not treat the input as HTML. markdown-it seems to sanitize HTML by default, which seems to be why it processes the markdown in the comment tags.

To bring it back to this issue, I'm wondering if the unclosed comment fix will result in everything after the <!-- to remain unprocessed?

@Crozzers
Copy link
Contributor

Crozzers commented Jan 6, 2024

To bring it back to this issue, I'm wondering if the unclosed comment fix will result in everything after the <!-- to remain unprocessed?

From my testing, this doesn't happen. Incomplete tags are handled differently to normal HTML. Normal HTML is escaped and the contents are hashed to prevent further processing. Incomplete tags just have the tag escaped and the contents left in place.

Personally, I find having this behaviour when safe_mode='escape' unusual, since setting it is supposed to not treat the input as HTML

This makes sense, and the difference in treatment between complete and incomplete HTML tags is a bit confusing. I'll look into this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants