-
Notifications
You must be signed in to change notification settings - Fork 12
Fixed fastDiff for multi-byte unicode sequences. #327
Conversation
Looking at the tests, a single emoji is treated as 2 characters? I'd rather prefer if it was considered as a single item. I think that this is more in line with how we think about them. When you replace 👨👧 with 👨👧👦 you want to have one remove and one insert as the whole thing was replaced WDYT, @scofalik? Besides that, we need an integration test in ckeditor5-typing that the original issue (typing) does not occur any more. |
Also, I'd love to have a similar set of tests for both |
Uh, I've just realise that the way how the diff functions work (index-based) does not allow to treat a multi-byte characters as a single item. That's a bummer ;/ |
Additional tests: ckeditor/ckeditor5-typing#228 |
cc @ckeditor/qa-team Could you test this PR? I'm interested in emoji support, IME on various platforms and perhaps something like spellchecking too. Please make sure to test how the thing becomes with long and short paragraphs. |
tests/diff.js
Outdated
const emojiDiffDelete = new Array( emojiLength ).fill( 'delete' ); | ||
|
||
it( 'should properly handle emoji insertion', () => { | ||
expect( diff( 'abc', 'ab🙂c' ) ).to.deep.equals( [ 'equal', 'equal', ...emojiDiffInsert, 'equal' ] ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to.deep.equal
to make it grammatical
src/fastdiff.js
Outdated
@@ -101,12 +101,13 @@ export default function fastDiff( a, b, cmp, atomicChanges = false ) { | |||
}; | |||
|
|||
// Transform text or any iterable into arrays for easier, consistent processing. | |||
// Array.from was used here but it generated incorrect results for multi-byte unicode sequences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather explain why we use the cloning mechanism of of arrays over using Array.from. Something like:
We convert the string to an array by using the
slice()
method because, unlike Array.from(), it returns single-byte array items. See ckeditor/ckeditor5#3147.We need to make sure here that
fastDiff()
works identical todiff()
here.
expect( diff( 'a🙂b', 'ab' ) ).to.deep.equals( [ 'equal', ...emojiDiffDelete, 'equal' ] ); | ||
} ); | ||
|
||
it( 'should properly replace emoji', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing a case with replacing one emoji with another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test I miss is for several emojis one after another. It creates some additional cases. I don't expect those to ever break, unless we'll start doing some weird code optimizations in diff/fastDiff, so those tests may not be essential, but it's easy to have them just in case.
tests/diff.js
Outdated
const emojiDiffDelete = new Array( emojiLength ).fill( 'delete' ); | ||
|
||
it( 'should properly handle emoji insertion', () => { | ||
expect( diff( 'abc', 'ab🙂c' ) ).to.deep.equals( [ 'equal', 'equal', ...emojiDiffInsert, 'equal' ] ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly speaking, this emojiDiffInsert
makes these tests much harder to read than they need to be. IMO, assertions should be as "raw" as possible so you don't need to jump to e.g. a definition of a variable to understand them. Here, I needed to scroll up to those consts to understand one of the tests (I started reading them from the bottom). Plus, I needed to check whether I'm right that these consts contain 2 items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, I can see that in the ZWJ tests we have 5 items in such consts, so it's more justified here. So I take my above comment back. It's fine the way you coded it.
The code looks good. I like the tests for their completeness. There's indeed a number of cases to cover and I think there's just one or two I missed. |
We've tested multiple cases and everything seems to be fine 👌 |
Tests: Added tests for emoji input. See ckeditor/ckeditor5-utils#327.
Suggested merge commit message (convention)
Fix: Fixed fastDiff for multi-byte unicode sequences. Closes ckeditor/ckeditor5#3147. Closes ckeditor/ckeditor5#6495.
Additional information