Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape variables used in attributes #84

Closed
arkhi opened this issue Nov 14, 2020 · 12 comments
Closed

Escape variables used in attributes #84

arkhi opened this issue Nov 14, 2020 · 12 comments
Labels
bug Something isn't working

Comments

@arkhi
Copy link

arkhi commented Nov 14, 2020

If I create a title with a " in it, then the generated markup will break in the admin pages.

With:

  • Grav v1.7.0-rc.17
  • Admin v1.10.0-rc.17

With this markdown file in /user/pages/test/w3c:

---
title: '<abbr title="World Wide Web Consortium">W3C</abbr>'
published: true
date: '13-11-2020 16:10'
publish_date: '13-11-2020 16:10'
---

Some content…

The generated code becomes:

<b title="<abbr title=" world="" wide="" web="" consortium"="">W3C"&gt;<abbr title="World Wide Web Consortium">W3C</abbr></b>

image

In this specific case, I guess removing the HTML tags would make sense.

@w00fz
Copy link
Member

w00fz commented Nov 14, 2020

I really don’t recommend using html for the title attribute in the front matter.
You can always create a custom entry such as:

title: W3C
title_html: '<abbr title="World Wide Web Consortium">W3C</abbr>'

This way you don’t make the title weird for the page metadata and wherever this is used behind the scenes. At the same time you can still use a custom html version for other rendering twit parts, such as your <b> tag

In admin we should probably escape and strip tags to ensure nothing breaks in the pages list

@arkhi
Copy link
Author

arkhi commented Nov 15, 2020

@w00fz: The <b> tag is the code used in the admin plugin.

For a quick example, I’m a playful contributor to the website and I use the following title for a page I have write access to:

"></b><script>alert('hahaha!');</script><b title="hahaha

or simply the following since the title is displayed outside of the attribute without escaping as well:

<script>alert('hahaha!');</script>

Loading /admin/pages displays a popup window with hahaha!.

Now… Let’s assume I’m a disgruntled employee or collaborator who can get a script to do something less playful. It would be interesting to know what can be done with the possibility of my script being loaded in any environment, including the administrator’s environment of the website, who has all privileges. Would deleting all pages be possible? All images? Or creating new ones with more scripts?

I could, for example, run something like:

<script>fetch('https://arkhi.org/cv/avatar.jpg')</script>

Even if CORS might limit the damage, although if some file is uploaded locally, it could be run and executed.

In general, I am more confident in a product if its developers are actively taking action in making sure variables used in their script are sanitised. Probably even more so with variables filled by users.

On a practical side, creating another variable (like title_html) means having to tweak every plugin that takes into account the default value of header.title for the title, including SEO related plugins.

Back to the specific case of <abbr>, since it is both semantic and an improvement on accessibility, removing it is putting the load on the editor (duplicating content) instead of the developer, and does not benefit anybody from what I understand.

Just my thoughts on what I perceive as a threat if not just a simple bug… I don’t know all the code in Grav for sure.

@w00fz
Copy link
Member

w00fz commented Nov 15, 2020

I’m agreeing with you that the title needs escaping as are saying and we are looking into it.
Although, I’m advising not to use html for title in general, it is bad practice and that title is also used for metadata tag. Most of the time it is expected to be plain text and especially for seo it could damage your score if it’s not plain.
You probably only need that abbreviation in one or two places and because you can override twig templates you can extend it so that your abbreviate takes place.
I doubt you’ll have to update all plugins and templates for it.

@arkhi
Copy link
Author

arkhi commented Nov 15, 2020

I’m agreeing with you that the title needs escaping as are saying and we are looking into it.

Sorry, I only read the initial email I received before you edited your comment; I didn’t see that.

That being said, I’m interested about some rational (links that explain the actual reasons or consequences) to explain what you wrote:

  1. not to use html for title in general, it is bad practice

  2. Most of the time it is expected to be plain text

  3. especially for seo it could damage your score if it’s not plain

Thanks!

@w00fz
Copy link
Member

w00fz commented Nov 15, 2020

Well to being with, the title in the front-matter is used for the title of the page <head>. W3C specification says the title of a page should not contain any HTML.

Reference: https://www.w3.org/TR/html401/struct/global.html#edef-TITLE
Excerpt: Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup (including comments).

Reference 2: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title
Excerpt: The contents of a page title can have significant implications for search engine optimization (SEO).

This alone is bad practice and bad practice never flies well with SEO. Title is one of the most important tags for a page and it will dictate how it appears in search results. Not to mention screen readers will have a very hard time reading it.

The second issue you will most likely notice if you use markup in the title, is that the browser expect that title to follow W3C semantics and so when you look at a page that has a marked up title, it will look very odd:

Monosnap 2020-11-15 11-24-50

Finally, most places in Grav, particularly themes and their templates, rely on a page title to be plain. This is so that a template can freely cast page.title and wrap it in any html markup desired.

Imagine having a page titled as title: <h1>My Title</h1> and then somewhere there was a template that listed all the pages in an unordered list (<ul><li>{{ page.title }}</li></ul>). This would definitely break the intention of the layout and any styling associated to it because the H1 would make that particular page's title extremely bigger.

That said, I know your <abbr> falls into the inline-elements category while my example <h1> falls under the block-elements category, and there is a distinction, but this is just one of many examples HTML in a title can do and why it shouldn't be attempted.

Hopefully this gives you a wider view as to why you shouldn't be trying to use any markup in the title tag. In the meantime, I have pushed a fix (getgrav/grav@1396525) to escape the title in the Flex pages list so that it doesn't look broken.

@arkhi
Copy link
Author

arkhi commented Nov 16, 2020

I see; I was talking about the Title in general, not <title> in particular.

It looks like we have different views about the responsibilities of a developer. I think code really should follow the Robustness principle: “Be conservative in what you send, be liberal in what you accept”.

With that perspective in mind, it is the developer’s job to make sure what their code (core, plugin, template, style…) is receiving will be displayed properly, especially if there is no safeguard and documentation before-hand that could constrain or guide the user input.

So what you refer as bad practice is if the sanitising did not happen, not about having HTML in a Title if its semantic requires it and the context allows it.

About getgrav/grav@1396525, the titles will certainly look funky if all special characters appear as HTML entities instead of being stripped like #85 offered to.

@w00fz
Copy link
Member

w00fz commented Nov 16, 2020

About getgrav/grav@1396525, the titles will certainly look funky if all special characters appear as HTML entities instead of being stripped like #85 offered to.

It certainly looks funky but that’s the exact representation of what you’ll get on the frontend. If we strip tags it wouldn’t be accurate.

@mahagr
Copy link
Contributor

mahagr commented Nov 17, 2020

I've been following this discussion silently and we've been fixing a couple of other bugs on admin -- admin does have auto-escaping on, but there were a couple of places where the output was not being sanitized properly. You're right, those were really bugs and ought to be fixed. But I agree with @w00fz that the right way to fix them is to do proper escaping when outputting the values, not to strip content away.

The reasoning is that no field in the form should accept HTML unless it specifies to be HTML only field (or if there's a toggle to enable HTML and logic to handle it). This is because if you allow both text and HTML at the same time, you either find special characters to become double escaped or not escaped at all. I consider both of them being bugs, but not escaping at all is a potential security vulnerability as well. This is the basic reason why fields like title can never allow HTML. That said, I wouldn't allow HTML in title_html either, but have a text field title_abbr instead and write code to support it -- with escaped values.

I agree that it is developers responsibility to properly sanitize and escape the output -- BUT the developer who is responsible is the one who creates the template file. This is because you cannot trust the files being safe as instead of using admin, someone can still add the XSS code by editing the files directly, either from the server or from GitHub. Also, there may be legit cases where you want to preserve that attacking code, like in forum posts or feedback.

All of this said it is obvious that template designers cannot be trusted with the responsibility to remember to escape every single variable. Because of this, Grav provides a setting to autoescape everything in the frontend and you should be turning it on. I've been advocating people to use that setting no matter how inconvenient it seems to be (to fix people's bad practices).

@mahagr mahagr added the bug Something isn't working label Nov 17, 2020
@mahagr
Copy link
Contributor

mahagr commented Dec 2, 2020

We decided to enable twig auto-escaping by default for all the new sites. Upgrading Grav via admin or CLI will check the existing configuration and adds the missing configuration options with the old defaults for the existing sites.

Closing this issue as we have fixed the missing escape in admin.

@mahagr mahagr closed this as completed Dec 2, 2020
@arkhi
Copy link
Author

arkhi commented Dec 9, 2020

Just mentioning getgrav/grav-plugin-admin#1990 as it seems related without fully checking.

@mahagr
Copy link
Contributor

mahagr commented Dec 9, 2020

@arkhi It's related, it was caused by this fix, which was completed in grav 1.7 but not in 1.6.

@mahagr
Copy link
Contributor

mahagr commented Dec 9, 2020

@arkhi I answered too fast, updated my response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants