Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD for payment 2023-05-16] [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters #17665

Closed
1 of 6 tasks
kbecciv opened this issue Apr 19, 2023 · 59 comments
Assignees
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor

Comments

@kbecciv
Copy link

kbecciv commented Apr 19, 2023

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

  1. Open staging.new.expensify.com
  2. Send this message : *** (triple asterisks)
  3. Send this message : ___ (triple underscores)
  4. Examine the difference of the parsing between the messages

Expected Result:

Both of the messages should be parsed the same, either all characters styling isn't parsed or the middle character got parsed

Actual Result:

While the bold characters (asterisks) isn't parsed, the italic characters (underscores) got parsed and only leave one underscore after parsed

Workaround:

Unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

  • Android / native
  • Android / Chrome
  • iOS / native
  • iOS / Safari
  • MacOS / Chrome / Safari
  • MacOS / Desktop

Version Number: 1.3.1.3

Reproducible in staging?: Yes

Reproducible in production?: Yes

If this was caught during regression testing, add the test name, ID and link from TestRail:

Email or phone of affected tester (no customers):

Logs: https://stackoverflow.com/c/expensify/questions/4856

Notes/Photos/Videos: Any additional supporting documentation

Triple.Underscore.Parsed.mp4
Recording.2525.mp4

Expensify/Expensify Issue URL:

Issue reported by: @kerupuksambel

Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1681867484400789

View all open jobs on GitHub

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~01ab464dbcaa9a04d3
  • Upwork Job ID: 1648736676150435840
  • Last Price Increase: 2023-04-26
@kbecciv kbecciv added Daily KSv2 Bug Something is broken. Auto assigns a BugZero manager. labels Apr 19, 2023
@MelvinBot
Copy link

Triggered auto assignment to @zanyrenney (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

@MelvinBot
Copy link

MelvinBot commented Apr 19, 2023

Bug0 Triage Checklist (Main S/O)

  • This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
  • This bug is not a duplicate report (check E/App issues and #expensify-bugs)
    • If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
  • This bug is reproducible using the reproduction steps in the OP. S/O
    • If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
    • If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
  • This issue is filled out as thoroughly and clearly as possible
    • Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
  • I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

@zanyrenney
Copy link
Contributor

can reproduce but I really don't think this is worth fixing:
2023-04-19_17-11-23

@zanyrenney
Copy link
Contributor

Going to check in the channel.

@zanyrenney
Copy link
Contributor

https://expensify.slack.com/archives/C01SKUP7QR0/p1681921007844539?thread_ts=1681872394.446279&cid=C01SKUP7QR0

@zanyrenney zanyrenney added the External Added to denote the issue can be worked on by a contributor label Apr 19, 2023
@melvin-bot melvin-bot bot changed the title Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters Apr 19, 2023
@MelvinBot
Copy link

Job added to Upwork: https://www.upwork.com/jobs/~01ab464dbcaa9a04d3

@MelvinBot
Copy link

Current assignee @zanyrenney is eligible for the External assigner, not assigning anyone new.

@MelvinBot
Copy link

Triggered auto assignment to Contributor-plus team member for initial proposal review - @parasharrajat (External)

@melvin-bot melvin-bot bot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Apr 19, 2023
@MelvinBot
Copy link

Triggered auto assignment to @hayata-suenaga (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

@hasebsiddiqui
Copy link
Contributor

hasebsiddiqui commented Apr 19, 2023

Proposal

Please re-state the problem that we are trying to solve in this issue.

Inconsistent parsing of chat for two consecutive styling characters

What is the root cause of that problem?

The problem is inside expensify-common library. The expensify-common/lib/ExpensiMark.js file contains all the regex that replaces the regular text and applies styling to it.
The problem is in this code block

{
                name: 'italic',
                regex: /(?!_blank")\b_((?!\s)[\s\S]*?\S)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<em>${g1}</em>`),
},
{
                name: 'bold',
                regex: /\B\*((?=\S)(([^\s*]|\s(?!\*))+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<strong>${g1}</strong>`),
},

These regex are inconsistent with each other.
According to the issue statement there are two ways to solve this problem.

  1. Parse neither bold nor italic
  2. Parse both of these

Below I am providing the solution for both of these but according to already present industry practices (github, slack) solution 1 should be followed.

What changes do you think we should make in order to solve the problem?

Make the italic and bold regex consistent with each other inside expensify-common library.

1. Parse neither bold nor italic

In the bold regex, we are excluding all * if it's inside * * in this part ([^\s*]|\s(?!\*)) but we are not doing this in italic regex.
The updated italic regex code block will be:

 {
                name: 'italic',
                regex: /(?!_blank")([^\W_]?)_((?![\s_])[\s\S]*?[^\s_])_(?![^\W_])(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1, g2) => g1 || (g2 && g2.includes('<pre>'))  ? match : `<em>${g2}</em>`,
},

Proof after fixing italic regex:

Screen.Recording.2023-04-27.at.2.14.16.AM.mov

2. Parse both bold and italic regex:

As mentioned we need to remove ([^\s*]|\s(?!\*)) this part from bold regex so that it doesn't ignore *'s inside **. The updated bold regex will be:

{
                name: 'bold',
                regex: /\B\*((?=\S)(([^\s]|\s(?!\*))+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<strong>${g1}</strong>`),
},

Proof after fixing bold regex:

Screen.Recording.2023-04-27.at.2.20.48.AM.mov

What alternative solutions did you explore? (Optional)

NA

@dukenv0307
Copy link
Contributor

dukenv0307 commented Apr 20, 2023

Proposal

Please re-state the problem that we are trying to solve in this issue.

While the bold characters (asterisks) isn't parsed, the italic characters (underscores) got parsed and only leave one underscore after parsed.

either all characters styling isn't parsed or the middle character got parsed.

We should take about what we want to do here first since there're 2 ways mentioned here. If we try some other popular markdown editor (like Github itself), we'll see that the styling character itself will not be included in the final styled text. For example if we try test ___ (3 underscores) on Github, it will show as is without styling the 3 underscores.

So I think we should do the same here. It also doesn't make sense to style the middle _ since _ by itself does not have any italic style.

What is the root cause of that problem?

While the bold characters (asterisks) isn't parsed

This is because in https://github.com/Expensify/expensify-common/blob/07db321d9dffbc66a56d7d7bf42ba1cbf3df216e/lib/ExpensiMark.js#L143, we're excluding the * if it's inside the * * pair (see the ([^\s*]|\s(?!\*)) part).

While in https://github.com/Expensify/expensify-common/blob/07db321d9dffbc66a56d7d7bf42ba1cbf3df216e/lib/ExpensiMark.js#L135, we're not doing that, we still allow the _ character. That's why the problem occurs for the _

What changes do you think we should make in order to solve the problem?

We need to update this regex https://github.com/Expensify/expensify-common/blob/07db321d9dffbc66a56d7d7bf42ba1cbf3df216e/lib/ExpensiMark.js#L135 to not allow the lonely _ character inside the _, _ pair. We can do the same as we already do for the * as explained above.

We can use this regex
/(?!_blank")([^\W_]?)_((?![\s_])[\s\S]*?[^\s_])_(?![^\W_])(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g

It has 3 changes compared to the current italic regex we're using:

  • Replace (?!\s) by (?![\s_]), so that it doesn't match _ at the beginning of the content
  • Replace \S by [^\s_], so that it doesn't match _ at the end of the content
  • Replace the word boundary \b by a regex at the beginning and end ([^\W_]?) and (?![^\W_]) so that the word boundary will also exclude the _ as word, so that cases like __content_ and _content__ will work correctly.

Also we need to update this replacement https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L143 so that it will not replace if the match starts with a word character (this logic works with ([^\W_]?))

replacement: (match, g1, g2) => g1 || (g2 && g2.includes('<pre>'))  ? match : `<em>${g2}</em>`

(We could have used (?<![^\W_]) instead of ([^\W_]?), but lookbehind is not supported in Safari so ([^\W_]?) along with capturing group check will be the right one to use, see here)

This will make the behavior the same as Github and other popular markdown editor.

Tested with the following cases and all works well:

___
_italic__
__italic_
_italic_another_italic_
_italic_
s_italic
italic_s

What alternative solutions did you explore? (Optional)

NA

@romulo114
Copy link
Contributor

romulo114 commented Apr 20, 2023

Proposal

Please re-state the problem that we are trying to solve in this issue.

Inconsistent parsing of consecutive styling characters.

What is the root cause of that problem?

The main cause of this issue is in bold and italic rules.

https://github.com/Expensify/expensify-common/blob/3cdaa947fe77016206c15e523017cd50678f2359/lib/ExpensiMark.js#L131-L152

            {
                name: 'italic',
                regex: /(?!_blank")\b_((?!\s)[\s\S]*?\S)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<em>${g1}</em>`),
            },
            {
                name: 'bold',
                regex: /\B\*((?=\S)(([^\s*]|\s(?!\*))+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<strong>${g1}</strong>`),
            },

What changes do you think we should make in order to solve the problem?

We need to extract the most inner match and ignore the empty styles. We can do this by the following rules.

            {
                name: 'italic',
                regex: /(?!_blank")\b_(_*)((?!\s)[\s\S]*?[\S]*?)(_*)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1, g2, g3) => ((g2.includes('<pre>') || !g2) ? match : `${g1}<em>${g2}</em>${g3}`),
            },
            {
                name: 'bold',
                regex: /\B\*(\**)((?=\S)[\s\S]*?[\S]*?)(\**)\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1, g2, g3) => ((g2.includes('<pre>') || !g2) ? match : `${g1}<strong>${g2}</strong>${g3}`),
            },
Result
mac_safari.mp4

What alternative solutions did you explore? (Optional)

N/A

@parasharrajat
Copy link
Member

What is the expected behavior here? @hayata-suenaga

  1. Should italic be parsed? I don't think both should be parsed in either bold or italics.

@hayata-suenaga
Copy link
Contributor

I believe both should not be parsed

@parasharrajat
Copy link
Member

Good, Thanks. To confirm asked here https://expensify.slack.com/archives/C01GTK53T8Q/p1682021148507319.

@zanyrenney
Copy link
Contributor

Heading OOO, reapplying the Bug0 label for active management.

@zanyrenney zanyrenney removed their assignment Apr 21, 2023
@zanyrenney zanyrenney added Bug Something is broken. Auto assigns a BugZero manager. and removed Bug Something is broken. Auto assigns a BugZero manager. labels Apr 21, 2023
@MelvinBot
Copy link

Triggered auto assignment to @alexpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

@MelvinBot
Copy link

Bug0 Triage Checklist (Main S/O)

  • This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
  • This bug is not a duplicate report (check E/App issues and #expensify-bugs)
    • If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
  • This bug is reproducible using the reproduction steps in the OP. S/O
    • If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
    • If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
  • This issue is filled out as thoroughly and clearly as possible
    • Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
  • I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

@parasharrajat
Copy link
Member

There is one more PR to upgrade the version in the App which will be done tomorrow.

@alexpensify
Copy link
Contributor

Thank you, I updated my comment.

@alexpensify
Copy link
Contributor

The new PR is in the works!

@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Daily KSv2 labels May 9, 2023
@melvin-bot melvin-bot bot changed the title [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters [HOLD for payment 2023-05-16] [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters May 9, 2023
@melvin-bot
Copy link

melvin-bot bot commented May 9, 2023

Reviewing label has been removed, please complete the "BugZero Checklist".

@melvin-bot melvin-bot bot removed the Reviewing Has a PR in review label May 9, 2023
@melvin-bot
Copy link

melvin-bot bot commented May 9, 2023

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.3.12-0 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2023-05-16. 🎊

After the hold period is over and BZ checklist items are completed, please complete any of the applicable payments for this issue, and check them off once done.

  • External issue reporter
  • Contributor that fixed the issue
  • Contributor+ that helped on the issue and/or PR

As a reminder, here are the bonuses/penalties that should be applied for any External issue:

  • Merged PR within 3 business days of assignment - 50% bonus
  • Merged PR more than 9 business days after assignment - 50% penalty

@melvin-bot
Copy link

melvin-bot bot commented May 9, 2023

BugZero Checklist: The PR fixing this issue has been merged! The following checklist (instructions) will need to be completed before the issue can be closed:

  • [@parasharrajat] The PR that introduced the bug has been identified. Link to the PR:
  • [@parasharrajat] The offending PR has been commented on, pointing out the bug it caused and why, so the author and reviewers can learn from the mistake. Link to comment:
  • [@parasharrajat] A discussion in #expensify-bugs has been started about whether any other steps should be taken (e.g. updating the PR review checklist) in order to catch this type of bug sooner. Link to discussion:
  • [@parasharrajat] Determine if we should create a regression test for this bug.
  • [@parasharrajat] If we decide to create a regression test for the bug, please propose the regression test steps to ensure the same bug will not reach production again.
  • [@alexpensify] Link the GH issue for creating/updating the regression test once above steps have been agreed upon:

@parasharrajat
Copy link
Member

I wasn't able to figure out the regression PR for this.

@parasharrajat
Copy link
Member

No Regression tests are needed as this is covered with unit tests.

@parasharrajat
Copy link
Member

parasharrajat commented May 9, 2023

I request to consider the 50% bonus for this issue. The issue took 4 business days to complete but there were two PRs. We did move very fast to test and merge both in 3 days but we have to wait for updating the hash from the other PR due to the sequential nature which caused us one day.

@hayata-suenaga
Copy link
Contributor

@alexpensify can you check @parasharrajat's comments?

@alexpensify
Copy link
Contributor

Thank you @parasharrajat for the work here. I'm aware of the two PRs and I'll review the payment breakout later this week since payment is not due until next week.

@alexpensify
Copy link
Contributor

Still on hold for payment.

@melvin-bot melvin-bot bot added Daily KSv2 and removed Weekly KSv2 labels May 15, 2023
@alexpensify
Copy link
Contributor

I've started an internal discussion to confirm the payments.

@alexpensify
Copy link
Contributor

I've reviewed the two PRs with the team. We have confirmed that it is common that some GitHubts like this one could require two PRs. One PR to make the update in expensify-common and then another to bring the updated expensify-common version in App. With that said, the bonus will not apply here. The merge was past the 3-business day mark. With that said, this GH has the following payouts.

Issue reporter: $250 - @kerupuksambel
Contributor: $1,000 - @dukenv0307
Contributor+: $1,000- @parasharrajat

If anyone has feedback please share. If no response, tomorrow, I’ll complete the next steps in Upwork. Thank you!

@alexpensify
Copy link
Contributor

@dylanexpensify is going to step in to help with the payment process in Upwork. I'm OOO until next Wednesday.

@dylanexpensify
Copy link
Contributor

@kerupuksambel, @dukenv0307, and @parasharrajat please apply here!

@dylanexpensify
Copy link
Contributor

@dukenv0307 please apply! @parasharrajat @kerupuksambel sent offers!

@dukenv0307
Copy link
Contributor

@dylanexpensify applied, thank you!

@dylanexpensify
Copy link
Contributor

@dukenv0307 offer sent!

@dylanexpensify
Copy link
Contributor

Done!

@alexpensify
Copy link
Contributor

Thanks for the assist here @dylanexpensify!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor
Projects
None yet
Development

No branches or pull requests

10 participants