Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Fixed fastDiff for multi-byte unicode sequences. #327

Merged
merged 6 commits into from
Mar 27, 2020
Merged

Fixed fastDiff for multi-byte unicode sequences. #327

merged 6 commits into from
Mar 27, 2020

Conversation

niegowski
Copy link
Contributor

@niegowski niegowski commented Mar 24, 2020

Suggested merge commit message (convention)

Fix: Fixed fastDiff for multi-byte unicode sequences. Closes ckeditor/ckeditor5#3147. Closes ckeditor/ckeditor5#6495.


Additional information

@coveralls
Copy link

coveralls commented Mar 24, 2020

Coverage Status

Coverage remained the same at 100.0% when pulling 2cccbad on i/3147 into 8846e66 on master.

@Reinmar
Copy link
Member

Reinmar commented Mar 24, 2020

Looking at the tests, a single emoji is treated as 2 characters? I'd rather prefer if it was considered as a single item. I think that this is more in line with how we think about them. When you replace 👨‍👧  with 👨‍👧‍👦 you want to have one remove and one insert as the whole thing was replaced

WDYT, @scofalik?

Besides that, we need an integration test in ckeditor5-typing that the original issue (typing) does not occur any more.

@Reinmar
Copy link
Member

Reinmar commented Mar 24, 2020

Also, I'd love to have a similar set of tests for both fastDiff() and diff() to verify that both treat those characters in identical ways. Those test should be more advanced (consider more cases),

@Reinmar
Copy link
Member

Reinmar commented Mar 24, 2020

Uh, I've just realise that the way how the diff functions work (index-based) does not allow to treat a multi-byte characters as a single item. That's a bummer ;/

src/fastdiff.js Outdated Show resolved Hide resolved
@niegowski
Copy link
Contributor Author

Additional tests: ckeditor/ckeditor5-typing#228

@Reinmar Reinmar self-assigned this Mar 25, 2020
@Reinmar
Copy link
Member

Reinmar commented Mar 26, 2020

cc @ckeditor/qa-team  Could you test this PR? I'm interested in emoji support, IME on various platforms and perhaps something like spellchecking too. Please make sure to test how the thing becomes with long and short paragraphs.

tests/diff.js Outdated
const emojiDiffDelete = new Array( emojiLength ).fill( 'delete' );

it( 'should properly handle emoji insertion', () => {
expect( diff( 'abc', 'ab🙂c' ) ).to.deep.equals( [ 'equal', 'equal', ...emojiDiffInsert, 'equal' ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to.deep.equal to make it grammatical

src/fastdiff.js Outdated
@@ -101,12 +101,13 @@ export default function fastDiff( a, b, cmp, atomicChanges = false ) {
};

// Transform text or any iterable into arrays for easier, consistent processing.
// Array.from was used here but it generated incorrect results for multi-byte unicode sequences.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather explain why we use the cloning mechanism of of arrays over using Array.from. Something like:

We convert the string to an array by using the slice() method because, unlike Array.from(), it returns single-byte array items. See ckeditor/ckeditor5#3147.

We need to make sure here that fastDiff() works identical to diff() here.

expect( diff( 'a🙂b', 'ab' ) ).to.deep.equals( [ 'equal', ...emojiDiffDelete, 'equal' ] );
} );

it( 'should properly replace emoji', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a case with replacing one emoji with another one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another test I miss is for several emojis one after another. It creates some additional cases. I don't expect those to ever break, unless we'll start doing some weird code optimizations in diff/fastDiff, so those tests may not be essential, but it's easy to have them just in case.

tests/diff.js Outdated
const emojiDiffDelete = new Array( emojiLength ).fill( 'delete' );

it( 'should properly handle emoji insertion', () => {
expect( diff( 'abc', 'ab🙂c' ) ).to.deep.equals( [ 'equal', 'equal', ...emojiDiffInsert, 'equal' ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly speaking, this emojiDiffInsert makes these tests much harder to read than they need to be. IMO, assertions should be as "raw" as possible so you don't need to jump to e.g. a definition of a variable to understand them. Here, I needed to scroll up to those consts to understand one of the tests (I started reading them from the bottom). Plus, I needed to check whether I'm right that these consts contain 2 items.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, I can see that in the ZWJ tests we have 5 items in such consts, so it's more justified here. So I take my above comment back. It's fine the way you coded it.

@Reinmar
Copy link
Member

Reinmar commented Mar 26, 2020

The code looks good. I like the tests for their completeness. There's indeed a number of cases to cover and I think there's just one or two I missed.

@Mgsy
Copy link
Member

Mgsy commented Mar 26, 2020

We've tested multiple cases and everything seems to be fine 👌

@Reinmar Reinmar merged commit 6dc1ba6 into master Mar 27, 2020
@Reinmar Reinmar deleted the i/3147 branch March 27, 2020 17:49
Reinmar added a commit to ckeditor/ckeditor5-typing that referenced this pull request Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to type in the editor Confusing characters after transform emoji using text-transformation
6 participants