-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stripHTML
to preserve leading spaces
#35457
Conversation
Size Change: +422 B (0%) Total Size: 1.07 MB
ℹ️ View Unchanged
|
Just flagging that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know!
I believe jest works because jsdom has included polyfill maybe?
expect( stripHTML( input ) ).toBe( output ); | ||
} ); | ||
|
||
it( 'should preserve leading spaces', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add some tests with \t
, \n
, and multiline html.
That's just how HTML works. Leading should collapse. What's the use case in preserving them? Worth noting that any consecutive spacing elsewhere will also collapse. |
Thanks your perspective here @ellatrix. Clearly I don't want to introduce a fix for something that doesn't need a fix. However please can you help me to understand the following?
My expectation was that if I use a utility called I wouldn't expect it to strip spaces from the string.
With respect, I would have thought that how HTML works really isn't really something the consumer should have to be aware of. If I use a utility, I expect it to do what it says on the tin. Update: here's However, I note the "tags" vs "html". Is that difference in nomenclature significant? |
Sure, if we're just looking to strip HTML tags, maybe we shouldn't parse it as HTML? While this fixes preserving leading and trailing whitespace, it doesn't preserve whitespace elsewhere.
I'm not sure I get why any HTML needs to be stripped? If you're displaying the value in an input field and this value needs to be plain text, how did any HTML tags end up in this value? |
From what I understand so far, perhaps this shouldn't be handled in We could also add an optional option |
@ellatrix Here's the scenario I was running in to. Screen.Capture.on.2021-10-11.at.12-13-24.mov
No longer conflicted. LinkControl shouldn't care. It's the Nav Link block (the consumer) that should strip the HTML formatting. In the future we can get cleverer about preserving the HTML tags coming from the
In what scenarios does this occur? |
@kevin940726 what goes in favor of removing the whitespace characters in
@ellatrix An alternative is to use regular expressions, but that sounds like a lot of extra effort and a gateway to accidental XSS issues. Parsing is easy and handles that for free, except the whitespaces. |
Sure, that's true. I'm still wondering why we need to strip HTML here though. Where does this HTML come from? If it's from Rich Text and we no HTML should be added, then it sounds like we need Plain Text instead? |
Here is where the string that contains HTML is provided to the component which is storing the gutenberg/packages/block-library/src/navigation-link/edit.js Lines 611 to 618 in bfc99c1
We need to use rich formatting in the Nav Link block because a user might want to make link items bold or italic or whatever. |
@getdave what if we strip formatting when editing via the split url/text control? Seeing html tags in a text field is pretty distracting in general, and I think it's pretty quick to reapply formatting in the rich text editor if folks need it. |
So we allow the user to add bold etc., but then we strip it when they use the link control? And if they edit the text there, the changes will be saved and the bold will be lost? Or is that text purely read-only in the link control? If it's editable, it should be a rich text field as well? |
Thanks for continuing to discuss the best solution here. Just so we know, the @jasmussen has said he feels comfortable that if we're editing the link text using the Link UI then it's ok to loose the formatting. The alternative would be for the Link UI to "know" about RichText or have the ability to display with a formatting toolbar (ie: what Ella says about "it should be a rich text field as well"). Personally I think for v1 we should to simply strip the HTML tags from the Nav Link block text when we pass it to the Link UI. This will cause loss of formatting but that's relatively minor for now. For a v2 we could then explore changing the text control in LinkControl to become RichText field in order that we can handle more values with formatting. Do we agree that is an acceptable route forward for #33849? Circling back to this PR however - shall I:
I'm currently concerned about
How can I replicate this? |
I really think the requirement to switch types from HTML to plain text is weird and requires extra consideration. I think we should drop that requirement. Either:
(3) should be a good enough compromise. The cases that someone has formatting in the value should be rare, and if they do, it's fine to have raw HTML in the text field. If that's not good, then simply render the value with rich text? We don't need to add formatting buttons there, just allow the rich text value to be edited. |
Yes indeed, to me it is a separation of concerns:
To clarify the flow, it is most definitely a "split" flow: Shown above, the first is the page list selector for navigation items. You click the plus, the URL dialog opens, and lets you choose a destination. The second shows up when you click a link item after the fact. The same separation exists: inside the dialog it's not rich text. Outside, it is. This is similar to how Google Docs does it: To me this isn't a question about whether to keep formatting or not, it's about whether to have the extra label editing affordance or not. I find it to be valuable, enough that it's okay for any formatting to be lost when using that interface. Keep in mind we have save revisions, and visible undo buttons for it, and the case where bold or italic is lost when editing here feels like it would be rare. |
In a paragraph block (for example) you might create a link which contains whitespace. When you are editing the text of that link it needs to include the whitespace in order that you can:
Screen.Capture.on.2021-10-11.at.12-35-58.1.movUpdate: I just realised this doesn't apply to the Nav block which uses HTML in the value. In which case...I guess we won't need to preserve leading/trailing spaces there. Does the question still stand about whether |
Worth noting that this will collapse to one space on the front-end. Rich text currently preserves the spaces for editing purposes, but ideally it should also just collapse the whitespace. What you're looking is the same function but with more context. E.g. when the container is an inline element, wp.dom.__unstableStripHTML('<a> test</a>')
=== ' test' Perhaps we need to allow specifying the container for stripping HTML? |
Probably part of the effort was inspired by this edgecase where I was stress testing the poor PR, in case it adds context to the leading/trailing space discussion. |
I created #35539 as an alternative. |
I see it does indeed do that: Screen.Capture.on.2021-10-12.at.13-25-24.mov |
Closing in favour of #35539 |
Description
In #33849 I discovered that
stripHTML()
will also strip any leading space characters from the string.I believe this is because the DOM Parser will ignore any space character coming after the DocType.
I don't believe this is the functionality we would want (or expect) from this utility, but let me know if I'm wrong.
This PR captures any spaces at the start of the string and then restores them to the string after the
DOMParser
has completed. This preserves the original whitespace on the string.I also added some unit tests - although I'm surprised they worked because I didn't think Jest would have access to the
window
🤔How has this been tested?
Pop open dev tools Console and type in the following (this simulates what we are currently doing in
trunk
):Notice how the leading spaces are stripped off.
Now run the unit tests on this PR:
Note how the leading spaces in the tests are now preserved.
Check implementation is suitable.
Screenshots
Types of changes
Checklist:
*.native.js
files for terms that need renaming or removal).