[HOLD for payment 2023-05-16] [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters #17665

kbecciv · 2023-04-19T15:59:35Z

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!

Action Performed:

Open staging.new.expensify.com
Send this message : *** (triple asterisks)
Send this message : ___ (triple underscores)
Examine the difference of the parsing between the messages

Expected Result:

Both of the messages should be parsed the same, either all characters styling isn't parsed or the middle character got parsed

Actual Result:

While the bold characters (asterisks) isn't parsed, the italic characters (underscores) got parsed and only leave one underscore after parsed

Workaround:

Unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

Version Number: 1.3.1.3

Reproducible in staging?: Yes

Reproducible in production?: Yes

If this was caught during regression testing, add the test name, ID and link from TestRail:

Email or phone of affected tester (no customers):

Logs: https://stackoverflow.com/c/expensify/questions/4856

Notes/Photos/Videos: Any additional supporting documentation

Triple.Underscore.Parsed.mp4

Recording.2525.mp4

Expensify/Expensify Issue URL:

Issue reported by: @kerupuksambel

Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1681867484400789

View all open jobs on GitHub

Upwork Automation - Do Not Edit

Upwork Job URL: https://www.upwork.com/jobs/~01ab464dbcaa9a04d3
Upwork Job ID: 1648736676150435840
Last Price Increase: 2023-04-26

MelvinBot · 2023-04-19T15:59:41Z

Triggered auto assignment to @zanyrenney (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

MelvinBot · 2023-04-19T15:59:45Z

Bug0 Triage Checklist (Main S/O)

This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
This bug is not a duplicate report (check E/App issues and #expensify-bugs)
- If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
This bug is reproducible using the reproduction steps in the OP. S/O
- If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
- If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
This issue is filled out as thoroughly and clearly as possible
- Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

zanyrenney · 2023-04-19T16:11:51Z

can reproduce but I really don't think this is worth fixing:

zanyrenney · 2023-04-19T16:15:21Z

Going to check in the channel.

zanyrenney · 2023-04-19T16:17:30Z

https://expensify.slack.com/archives/C01SKUP7QR0/p1681921007844539?thread_ts=1681872394.446279&cid=C01SKUP7QR0

MelvinBot · 2023-04-19T17:13:52Z

Job added to Upwork: https://www.upwork.com/jobs/~01ab464dbcaa9a04d3

MelvinBot · 2023-04-19T17:13:56Z

Current assignee @zanyrenney is eligible for the External assigner, not assigning anyone new.

MelvinBot · 2023-04-19T17:14:04Z

Triggered auto assignment to Contributor-plus team member for initial proposal review - @parasharrajat (External)

MelvinBot · 2023-04-19T17:14:10Z

Triggered auto assignment to @hayata-suenaga (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

hasebsiddiqui · 2023-04-19T19:51:42Z

Proposal

Please re-state the problem that we are trying to solve in this issue.

Inconsistent parsing of chat for two consecutive styling characters

What is the root cause of that problem?

The problem is inside expensify-common library. The expensify-common/lib/ExpensiMark.js file contains all the regex that replaces the regular text and applies styling to it.
The problem is in this code block

{
                name: 'italic',
                regex: /(?!_blank")\b_((?!\s)[\s\S]*?\S)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<em>${g1}</em>`),
},
{
                name: 'bold',
                regex: /\B\*((?=\S)(([^\s*]|\s(?!\*))+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<strong>${g1}</strong>`),
},

These regex are inconsistent with each other.
According to the issue statement there are two ways to solve this problem.

Parse neither bold nor italic
Parse both of these

Below I am providing the solution for both of these but according to already present industry practices (github, slack) solution 1 should be followed.

What changes do you think we should make in order to solve the problem?

Make the italic and bold regex consistent with each other inside expensify-common library.

1. Parse neither bold nor italic

In the bold regex, we are excluding all * if it's inside * * in this part ([^\s*]|\s(?!\*)) but we are not doing this in italic regex.
The updated italic regex code block will be:

 {
                name: 'italic',
                regex: /(?!_blank")([^\W_]?)_((?![\s_])[\s\S]*?[^\s_])_(?![^\W_])(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1, g2) => g1 || (g2 && g2.includes('<pre>'))  ? match : `<em>${g2}</em>`,
},

Proof after fixing italic regex:

Screen.Recording.2023-04-27.at.2.14.16.AM.mov

2. Parse both bold and italic regex:

As mentioned we need to remove ([^\s*]|\s(?!\*)) this part from bold regex so that it doesn't ignore *'s inside **. The updated bold regex will be:

{
                name: 'bold',
                regex: /\B\*((?=\S)(([^\s]|\s(?!\*))+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<strong>${g1}</strong>`),
},

Proof after fixing bold regex:

Screen.Recording.2023-04-27.at.2.20.48.AM.mov

What alternative solutions did you explore? (Optional)

NA

dukenv0307 · 2023-04-20T09:47:02Z

Proposal

Please re-state the problem that we are trying to solve in this issue.

While the bold characters (asterisks) isn't parsed, the italic characters (underscores) got parsed and only leave one underscore after parsed.

either all characters styling isn't parsed or the middle character got parsed.

We should take about what we want to do here first since there're 2 ways mentioned here. If we try some other popular markdown editor (like Github itself), we'll see that the styling character itself will not be included in the final styled text. For example if we try test ___ (3 underscores) on Github, it will show as is without styling the 3 underscores.

So I think we should do the same here. It also doesn't make sense to style the middle _ since _ by itself does not have any italic style.

What is the root cause of that problem?

While the bold characters (asterisks) isn't parsed

This is because in https://github.com/Expensify/expensify-common/blob/07db321d9dffbc66a56d7d7bf42ba1cbf3df216e/lib/ExpensiMark.js#L143, we're excluding the * if it's inside the * * pair (see the ([^\s*]|\s(?!\*)) part).

While in https://github.com/Expensify/expensify-common/blob/07db321d9dffbc66a56d7d7bf42ba1cbf3df216e/lib/ExpensiMark.js#L135, we're not doing that, we still allow the _ character. That's why the problem occurs for the _

What changes do you think we should make in order to solve the problem?

We need to update this regex https://github.com/Expensify/expensify-common/blob/07db321d9dffbc66a56d7d7bf42ba1cbf3df216e/lib/ExpensiMark.js#L135 to not allow the lonely _ character inside the _, _ pair. We can do the same as we already do for the * as explained above.

We can use this regex
/(?!_blank")([^\W_]?)_((?![\s_])[\s\S]*?[^\s_])_(?![^\W_])(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g

It has 3 changes compared to the current italic regex we're using:

Replace (?!\s) by (?![\s_]), so that it doesn't match _ at the beginning of the content
Replace \S by [^\s_], so that it doesn't match _ at the end of the content
Replace the word boundary \b by a regex at the beginning and end ([^\W_]?) and (?![^\W_]) so that the word boundary will also exclude the _ as word, so that cases like __content_ and _content__ will work correctly.

Also we need to update this replacement https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L143 so that it will not replace if the match starts with a word character (this logic works with ([^\W_]?))

replacement: (match, g1, g2) => g1 || (g2 && g2.includes('<pre>'))  ? match : `<em>${g2}</em>`

(We could have used (?<![^\W_]) instead of ([^\W_]?), but lookbehind is not supported in Safari so ([^\W_]?) along with capturing group check will be the right one to use, see here)

This will make the behavior the same as Github and other popular markdown editor.

Tested with the following cases and all works well:

___
_italic__
__italic_
_italic_another_italic_
_italic_
s_italic
italic_s

What alternative solutions did you explore? (Optional)

NA

romulo114 · 2023-04-20T10:04:01Z

Proposal

Please re-state the problem that we are trying to solve in this issue.

Inconsistent parsing of consecutive styling characters.

What is the root cause of that problem?

The main cause of this issue is in bold and italic rules.

https://github.com/Expensify/expensify-common/blob/3cdaa947fe77016206c15e523017cd50678f2359/lib/ExpensiMark.js#L131-L152

            {
                name: 'italic',
                regex: /(?!_blank")\b_((?!\s)[\s\S]*?\S)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<em>${g1}</em>`),
            },
            {
                name: 'bold',
                regex: /\B\*((?=\S)(([^\s*]|\s(?!\*))+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1) => (g1.includes('<pre>') ? match : `<strong>${g1}</strong>`),
            },

What changes do you think we should make in order to solve the problem?

We need to extract the most inner match and ignore the empty styles. We can do this by the following rules.

            {
                name: 'italic',
                regex: /(?!_blank")\b_(_*)((?!\s)[\s\S]*?[\S]*?)(_*)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1, g2, g3) => ((g2.includes('<pre>') || !g2) ? match : `${g1}<em>${g2}</em>${g3}`),
            },
            {
                name: 'bold',
                regex: /\B\*(\**)((?=\S)[\s\S]*?[\S]*?)(\**)\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
                replacement: (match, g1, g2, g3) => ((g2.includes('<pre>') || !g2) ? match : `${g1}<strong>${g2}</strong>${g3}`),
            },

Result

mac_safari.mp4

What alternative solutions did you explore? (Optional)

N/A

parasharrajat · 2023-04-20T13:40:05Z

What is the expected behavior here? @hayata-suenaga

Should italic be parsed? I don't think both should be parsed in either bold or italics.

hayata-suenaga · 2023-04-20T18:10:11Z

I believe both should not be parsed

parasharrajat · 2023-04-20T20:06:16Z

Good, Thanks. To confirm asked here https://expensify.slack.com/archives/C01GTK53T8Q/p1682021148507319.

zanyrenney · 2023-04-21T17:29:32Z

Heading OOO, reapplying the Bug0 label for active management.

MelvinBot · 2023-04-21T17:29:49Z

Triggered auto assignment to @alexpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

MelvinBot · 2023-04-21T17:29:53Z

Bug0 Triage Checklist (Main S/O)

This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
This bug is not a duplicate report (check E/App issues and #expensify-bugs)
- If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
This bug is reproducible using the reproduction steps in the OP. S/O
- If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
- If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
This issue is filled out as thoroughly and clearly as possible
- Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

parasharrajat · 2023-05-05T20:38:28Z

There is one more PR to upgrade the version in the App which will be done tomorrow.

alexpensify · 2023-05-05T20:41:35Z

Thank you, I updated my comment.

alexpensify · 2023-05-08T22:11:01Z

The new PR is in the works!

melvin-bot · 2023-05-09T03:43:07Z

Reviewing label has been removed, please complete the "BugZero Checklist".

melvin-bot · 2023-05-09T03:43:08Z

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.3.12-0 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

Fix: Update regex of italic in expensify common repo #18406

If no regressions arise, payment will be issued on 2023-05-16. 🎊

After the hold period is over and BZ checklist items are completed, please complete any of the applicable payments for this issue, and check them off once done.

External issue reporter
Contributor that fixed the issue
Contributor+ that helped on the issue and/or PR

As a reminder, here are the bonuses/penalties that should be applied for any External issue:

Merged PR within 3 business days of assignment - 50% bonus
Merged PR more than 9 business days after assignment - 50% penalty

melvin-bot · 2023-05-09T03:43:10Z

BugZero Checklist: The PR fixing this issue has been merged! The following checklist (instructions) will need to be completed before the issue can be closed:

[@parasharrajat] The PR that introduced the bug has been identified. Link to the PR:
[@parasharrajat] The offending PR has been commented on, pointing out the bug it caused and why, so the author and reviewers can learn from the mistake. Link to comment:
[@parasharrajat] A discussion in #expensify-bugs has been started about whether any other steps should be taken (e.g. updating the PR review checklist) in order to catch this type of bug sooner. Link to discussion:
[@parasharrajat] Determine if we should create a regression test for this bug.
[@parasharrajat] If we decide to create a regression test for the bug, please propose the regression test steps to ensure the same bug will not reach production again.
[@alexpensify] Link the GH issue for creating/updating the regression test once above steps have been agreed upon:

parasharrajat · 2023-05-09T08:54:13Z

I wasn't able to figure out the regression PR for this.

parasharrajat · 2023-05-09T08:55:23Z

No Regression tests are needed as this is covered with unit tests.

parasharrajat · 2023-05-09T09:02:03Z

I request to consider the 50% bonus for this issue. The issue took 4 business days to complete but there were two PRs. We did move very fast to test and merge both in 3 days but we have to wait for updating the hash from the other PR due to the sequential nature which caused us one day.

hayata-suenaga · 2023-05-09T14:50:05Z

@alexpensify can you check @parasharrajat's comments?

alexpensify · 2023-05-09T16:12:55Z

Thank you @parasharrajat for the work here. I'm aware of the two PRs and I'll review the payment breakout later this week since payment is not due until next week.

alexpensify · 2023-05-12T21:13:19Z

Still on hold for payment.

alexpensify · 2023-05-15T22:17:24Z

I've started an internal discussion to confirm the payments.

alexpensify · 2023-05-17T04:54:38Z

I've reviewed the two PRs with the team. We have confirmed that it is common that some GitHubts like this one could require two PRs. One PR to make the update in expensify-common and then another to bring the updated expensify-common version in App. With that said, the bonus will not apply here. The merge was past the 3-business day mark. With that said, this GH has the following payouts.

Issue reporter: $250 - @kerupuksambel
Contributor: $1,000 - @dukenv0307
Contributor+: $1,000- @parasharrajat

If anyone has feedback please share. If no response, tomorrow, I’ll complete the next steps in Upwork. Thank you!

alexpensify · 2023-05-17T15:01:15Z

@dylanexpensify is going to step in to help with the payment process in Upwork. I'm OOO until next Wednesday.

dylanexpensify · 2023-05-17T17:40:32Z

@kerupuksambel, @dukenv0307, and @parasharrajat please apply here!

dylanexpensify · 2023-05-18T08:32:59Z

@dukenv0307 please apply! @parasharrajat @kerupuksambel sent offers!

dukenv0307 · 2023-05-18T09:27:29Z

@dylanexpensify applied, thank you!

dylanexpensify · 2023-05-18T09:38:38Z

@dukenv0307 offer sent!

dylanexpensify · 2023-05-18T11:04:14Z

Done!

alexpensify · 2023-05-24T21:13:32Z

Thanks for the assist here @dylanexpensify!

kbecciv added Daily KSv2 Bug Something is broken. Auto assigns a BugZero manager. labels Apr 19, 2023

melvin-bot bot assigned zanyrenney Apr 19, 2023

zanyrenney added the External Added to denote the issue can be worked on by a contributor label Apr 19, 2023

melvin-bot bot changed the title ~~Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters~~ [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters Apr 19, 2023

melvin-bot bot assigned parasharrajat Apr 19, 2023

melvin-bot bot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Apr 19, 2023

melvin-bot bot assigned hayata-suenaga Apr 19, 2023

zanyrenney removed their assignment Apr 21, 2023

zanyrenney added Bug Something is broken. Auto assigns a BugZero manager. and removed Bug Something is broken. Auto assigns a BugZero manager. labels Apr 21, 2023

melvin-bot bot assigned alexpensify Apr 21, 2023

melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Daily KSv2 labels May 9, 2023

melvin-bot bot changed the title ~~[$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters~~ [HOLD for payment 2023-05-16] [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters May 9, 2023

melvin-bot bot removed the Reviewing Has a PR in review label May 9, 2023

melvin-bot bot added Daily KSv2 and removed Weekly KSv2 labels May 15, 2023

alexpensify assigned dylanexpensify May 17, 2023

dylanexpensify closed this as completed May 18, 2023

[HOLD for payment 2023-05-16] [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters #17665

[HOLD for payment 2023-05-16] [$1000] Web - Chat - Inconsistent parsing on occurrence of more than two consecutive styling characters #17665

Comments

kbecciv commented Apr 19, 2023 • edited by melvin-bot bot Loading

Action Performed:

Expected Result:

Actual Result:

Workaround:

Platforms:

MelvinBot commented Apr 19, 2023

MelvinBot commented Apr 19, 2023 • edited by zanyrenney Loading

Bug0 Triage Checklist (Main S/O)

zanyrenney commented Apr 19, 2023

zanyrenney commented Apr 19, 2023

zanyrenney commented Apr 19, 2023

MelvinBot commented Apr 19, 2023

MelvinBot commented Apr 19, 2023

MelvinBot commented Apr 19, 2023

MelvinBot commented Apr 19, 2023

hasebsiddiqui commented Apr 19, 2023 • edited Loading

Proposal

Please re-state the problem that we are trying to solve in this issue.

What is the root cause of that problem?

What changes do you think we should make in order to solve the problem?

1. Parse neither bold nor italic

Proof after fixing italic regex:

2. Parse both bold and italic regex:

Proof after fixing bold regex:

What alternative solutions did you explore? (Optional)

dukenv0307 commented Apr 20, 2023 • edited Loading

Proposal

Please re-state the problem that we are trying to solve in this issue.

What is the root cause of that problem?

What changes do you think we should make in order to solve the problem?

What alternative solutions did you explore? (Optional)

romulo114 commented Apr 20, 2023 • edited Loading

Proposal

Please re-state the problem that we are trying to solve in this issue.

What is the root cause of that problem?

What changes do you think we should make in order to solve the problem?

What alternative solutions did you explore? (Optional)

parasharrajat commented Apr 20, 2023

hayata-suenaga commented Apr 20, 2023

parasharrajat commented Apr 20, 2023

zanyrenney commented Apr 21, 2023

MelvinBot commented Apr 21, 2023

MelvinBot commented Apr 21, 2023

Bug0 Triage Checklist (Main S/O)

parasharrajat commented May 5, 2023

alexpensify commented May 5, 2023

alexpensify commented May 8, 2023

melvin-bot bot commented May 9, 2023

melvin-bot bot commented May 9, 2023

melvin-bot bot commented May 9, 2023

parasharrajat commented May 9, 2023

parasharrajat commented May 9, 2023

parasharrajat commented May 9, 2023 • edited Loading

hayata-suenaga commented May 9, 2023

alexpensify commented May 9, 2023

alexpensify commented May 12, 2023

alexpensify commented May 15, 2023

alexpensify commented May 17, 2023

alexpensify commented May 17, 2023

dylanexpensify commented May 17, 2023

dylanexpensify commented May 18, 2023

dukenv0307 commented May 18, 2023

dylanexpensify commented May 18, 2023

dylanexpensify commented May 18, 2023

alexpensify commented May 24, 2023

kbecciv commented Apr 19, 2023 •

edited by melvin-bot bot

Loading

MelvinBot commented Apr 19, 2023 •

edited by zanyrenney

Loading

hasebsiddiqui commented Apr 19, 2023 •

edited

Loading

dukenv0307 commented Apr 20, 2023 •

edited

Loading

romulo114 commented Apr 20, 2023 •

edited

Loading

parasharrajat commented May 9, 2023 •

edited

Loading