Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Introduced fastDiff() #238

Merged
merged 7 commits into from
May 9, 2018
Merged

Introduced fastDiff() #238

merged 7 commits into from
May 9, 2018

Conversation

f1ames
Copy link
Contributor

@f1ames f1ames commented Apr 16, 2018

Suggested merge commit message (convention)

Other: Introduced fastDiff diffing function. Closes ckeditor/ckeditor5#5000.


Additional information

See ckeditor/ckeditor5#5000 for more details.

@coveralls
Copy link

coveralls commented Apr 16, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 47b2c4f on t/235 into 2974f62 on master.

@Reinmar Reinmar requested a review from scofalik April 20, 2018 11:02
src/fastdiff.js Outdated
* Finds position of the first and last change in the given strings and generates set of changes. Set of changes
* can be applied to the input text in order to transform it into the output text, for example:
*
* let input = '12abc3';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add at least two more, basic examples. One example: adding a few consecutive characters somewhere in the string. Second example: removing a few consecutive characters from the string. The example you provided is actually kind of an edge case, where characters were removed from the beginning and from the end. So, the output might actually be surprising for someone.

Other than that, I am not sure about the "usage" snippet. I have a feeling that the example is a bit uninspiring. Even though it shows how to use the output, somebody would think "why would I want to do it?". I have a feeling that we might want to drop this snippet.

I'd also shorten the examples:

* (...) for example:
*
*		fastDiff( '12a',	'12xyza' );	// [ { ... } ]
*		fastDiff( '12a',	'12aa' );	// [ { ... } ]
*		fastDiff( '12xyza',	'12a' );	// [ { ... } ]
*		fastDiff( '12aa',	'12a' );	// [ { ... } ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the output is too long for the comment in the same line, it can be above or below starting with "Following produces:" and with extra line between each example.

src/fastdiff.js Outdated
// The above indexes means that in `oldText` modified part is `1[23]4` and in the `newText` it is `1[342]4`.
// Based on such indexes, array with `insert`/`delete` operations which allows transforming
// old text to the new one could be generated.
//
Copy link
Contributor

@scofalik scofalik Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it return correct values if strings are the same? If not, then I'd add a note that it is assumed that the strings are different.

src/fastdiff.js Outdated
// If not found, it means first change is at the end of the string.
let firstIndex = oldTextLength;
for ( let i = 0; i < oldTextLength; i++ ) {
if ( i >= newTextLength || oldText[ i ] !== newText[ i ] ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i >= newTextLength should not be needed. If it is true, then newText[ i ] == undefined so surely the second condition will be met.

src/fastdiff.js Outdated
// oldText: '321ba' -> '21ba' -> 'ab12'
// newText: '31ba' -> '1ba' -> 'ab1'
// { firstIndex: 1, lastIndexOld: 2, lastIndexNew: 1 }
if ( i >= newTextReversedLength ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those additional conditions are probably not needed either.

Copy link
Contributor

@scofalik scofalik Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, unfortunately, it is needed, because otherwise, calculations mess up in the last if. There's a difference if the texts differ because of an actual different letter or because of one of them is shorter.

@scofalik
Copy link
Contributor

I've spent some time analyzing this code, cause it looked to me that the fastDiff function is too complicated. There are many variables initialized, strings are reversed and cut and you quickly stop to understand what's what and why.

So, the algorithm itself cannot be simpler. Of course, finding the first change is easy. Finding the last change is more difficult. Cutting and reversing the string leads in fact to the simpler implementation (you could do the same using proper offsets but it would be even worse).

However, I think that it may be written in a simpler way. I'll propose something.

@scofalik
Copy link
Contributor

I've pushed two commits.

In the first one, I only changed @f1ames solution a bit.

In the second one, I proposed a different solution. Since I am subjective on this matter, maybe @Reinmar could take a look and say which solution looks cleaner.

BTW. Because of a non-precise description in findChangeBoundaryIndexes I thought that may algorithm should not work. Because the original documentation says that the lastNewIndex and lastOldIndex are indexes of change, then there should be no difference between:

123 vs 123123 and 1234 vs 123123.

Because both of them differ on the third index, both of them should have lastOldIndex equal to 3. That meant that my solution needed an additional if because it produced a different lastOldIndex in both scenarios. Now imagine my shock when I saw that tests failed with the special if and that they succeed without it :D. So I looked at the docs and even in the example from the description you can see that the lastOldIndex and lastNewIndex are not indexes of change but rather an index of the last common character.

@scofalik
Copy link
Contributor

BTW. I've also added two tests - one that I thought should fail for my algorithm, and the one that is used in the docs.

@Reinmar
Copy link
Member

Reinmar commented May 4, 2018

Since I am subjective on this matter, maybe @Reinmar could take a look and say which solution looks cleaner.

I'm sure you'll be able to figure out which one you like more, guys ;) You know far more about this code than I'll be able to learn now.

src/fastdiff.js Outdated
// @param {String} newText
// @returns {Object}
// @returns {Number} return.firstIndex Index of the first change in both strings (always the same for both).
// @returns {Number} result.lastIndexOld Index of the last common character in `oldText` string looking from back.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be:

result.lastIndexOld Index of the last common character in oldText string.

or

result.lastIndexOld Index of the first common character in oldText string looking from back.

Because looking for the last common character starting from the back, means you are looking for the first common character basically.

@f1ames
Copy link
Contributor Author

f1ames commented May 9, 2018

I just added some docs improvements. TBH @scofalik, your refactor made it more readable and simplified it a little so I'm fine with it. Looks good IMHO.

Other than that, I am not sure about the "usage" snippet. I have a feeling that the example is a bit uninspiring. Even though it shows how to use the output, somebody would think "why would I want to do it?". I have a feeling that we might want to drop this snippet.

I see you left the snippet, was it on purpose? I wasn't sure either, but it gives some general picture how fastDiff output can be used so maybe we could left it there.

Merged latest changes from master too.

I think it is ready for review, from my perspective it looks good, nothing to add here.

@f1ames f1ames merged commit 81fefc9 into master May 9, 2018
@f1ames
Copy link
Contributor Author

f1ames commented May 9, 2018

As agreed with @scofalik F2F, I have merged the PR as it was ready 🎉

@f1ames f1ames deleted the t/235 branch May 9, 2018 11:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce fastDiff()
4 participants