-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AO3-6719 Leave src URLs instead of emptiness for images stripped in AO3-6564 #4799
Conversation
Wait, when did we change Rubocop from wanting I don't care -- I actually like the slashes better -- but I don't want to change it until I'm sure Rubocop is doing what we want. |
I think we've always used the default configuration, which is slashes with AllowInnerSlashes false. That makes it want |
lib/html_cleaner.rb
Outdated
# The cleaned field will always use double quotes (") but the fields | ||
# in tests aren't always cleaned, so we're allowing double or single | ||
# quotes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, how sure is it that it's only the tests where the fields aren't cleaned? An uncleaned field could have unexpected formatting beyond just different quotes, which would break/bypass the two regular expressions here.
A lot of the time, this function is called from sanitize_field
. I think it would be more reliable to use Sanitize or nokogiri in sanitize_value
to do the stripping. There we're already dealing with those libs and I'd expect them to be more reliable than making our own regexp (don't reinvent the wheel and all that).
Doing the image stripping in sanitize_value
would also make things a bit nicer because we could use a config similar to ArchiveConfig.FIELDS_ALLOWING_VIDEO_EMBEDS
to designate fields to clean (or rather, not clean), instead of calling strip_images during display. That also makes it less likely that some display spots will get missed like it already happened in AO3-6564.
With the source urls getting preserved, doing this in the sanitisation step also wouldn't lose any data, if that was a concern previously.
Hold pls. |
…a sanitized field would return
end | ||
|
||
it "removes the img tag entirely when the src uses single quotes" do | ||
string = "Hi! <img src='http://example.org/image.png'> Bye" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour seems a bit confusing. I know in real usage we're not really going to have single quotes, but for consistency would it be better to treat single/double quotes the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, idk. I originally treated them the same for consistency, even though I don't really like writing code just to make tests pass, but after Bilka's comment, I got more skittish about that approach.
I think maybe we should have different behavior so it forces us to think about what's going on, e.g., if we're seeing the wrong behavior on production, that's a big yikes because it means the text isn't getting sanitized properly, and if we're seeing the wrong behavior in tests, is it because we're making test data in a way that's bypassing the sanitizer, or because there's actually a problem.
That said, I'm happy to take the path of least resistance and do ['"]
in the regex instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of ways to break this new regexp, from more spaces to newlines. But I don't think rewriting this regexp until I run out of obvious ways to break it will get us anywhere actually safe, that's why I suggested using a lib for it. The regexp will never quite protect from the case of some place accidentally doing sanitize_field(strip_images())
(perhaps hidden behind or separated by steps of abstraction) instead of using the arg or the correct order. Or one of the myriad of other ways unsanitized content could end up in strip_images (like in tests) while still appearing safe at first glance.
But I see that the dynamic config option means it can't happen during the sanitize step and tossing the whole thing into a whole nokogiri/Sanitize setup for a second time in strip_images isn't a great solution either. Furthermore, this PR is not the right place to rework comments to store unsanitized content and only sanitize (and strip images during sanitization) for display.
So, I'd say to make a regexp that's readable/feels good to you and only secondary that it handles unexpected input. And then in the integration tests where things should be sanitized beforehand, use an img tag that specifically would break the regexp if things run in the wrong order. That should be slightly more protection against the "big yikes" than the current setup. Not good, but better 😭
If you use a newline, you end up with an img and no src,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three minor missed things in the tests, rest looks good!
end | ||
|
||
context "when the comment is on a tag" do | ||
let(:comment) { create(:comment, :on_tag) } | ||
|
||
it_behaves_like "a notification email with a link to the comment" | ||
it_behaves_like "a notification email with a link to reply to the comment" | ||
it_behaves_like "a comment subject to image safety mode settings" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This it_behaves_like should also be added to the context "when the comment is a reply to another comment"
a few lines below this one.
|
||
before do | ||
comment.comment_content += image_tag | ||
comment.save! | ||
end | ||
|
||
it "strips the image from the email message" do | ||
expect(email).not_to have_html_part_content(image_tag) | ||
context "when image safety mode is enabled for admin post comments" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This context should check text content, like its counterpart for #edited_comment_notification
below.
When I view <commentable> with comments | ||
Then I should see the image "src" text "https://example.com/image.jpg" | ||
And I should not see "OMG! https://example.com/image.jpg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test added to add_comment.feature in #4729 can be removed, because this tests the same.
Issue
https://otwarchive.atlassian.net/browse/AO3-6719
Purpose
Adds a
keep_src
keyword argument to thestrip_images
method that lets us have the option of replacing the strippedimg
tag with itssrc
URL instead of empty space. I'm happy to give it a different name or make it a separate method. I very much hate the hack to make it work with tests, but I also felt like I'd get side-eyed for changing the tests. 😆 I can do that instead, though!Replaces the
strip_images
keyword argument onsanitize_field
with a new argument calledimage_safety_mode
. We use it in the same places we were usingstrip_images
, but it's a more flexible name in case we decide to switch to placeholders or go back to nothingness or something else entirely -- right now it usesstrip_images
withkeep_src: true
on the sanitized content. Again, delighted to give it a different name if you have one.I assumed that we wouldn't mind including the image URL in the main content of support tickets and abuse reports, as well as the emailed copies of same. They were kind of a weird hybrid of "affected by AO3-6564" and "not affected by AO3-6564." Happy to go back to no images at all, though.
ETA: And the big one, make it so we can use the config to temporarily strip images in work comments without a deploy. (You can do tags, too, but that seems unlikely to be an issue.) We'll need to set
PARENTS_WITH_IMAGE_SAFETY_MODE: ["AdminPost"]
when this is deployed to keep the current behavior of stripping images from news post comments.Testing Instructions
How can the Archive's QA team verify that this is working as you intended?
If you have a Jira account with access, please update or comment on the issue
with any new or missing testing instructions instead.
References