Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting moved sections #162

Open
briochemc opened this issue Oct 30, 2018 · 7 comments
Open

Detecting moved sections #162

briochemc opened this issue Oct 30, 2018 · 7 comments

Comments

@briochemc
Copy link
Contributor

Is there a way to tell latex-diff to figure out when whole sections are moved around?

@ftilmann
Copy link
Owner

I thought about this already for a long time but quite difficult to do in a useful way as unlike in Word there is no access to the editing process (which can sometimes be a good thing), and if, for example, a whole paragraph was moved, and then one or two words changed, it should still appear as a moved paragraph with some edits.

So for now probably not feasible, unfortunately.

@flying-sheep
Copy link

It could work on a per-paragraph basis, trying to find 1:1 mappings of the closest corresponding paragraphs and calculating the differences between them.

@ftilmann
Copy link
Owner

Thanks for the suggestion. Still not so quick to do in practice (or do you know of an algorithm implemented in perl that does fuzzy differencing of tokenized text?).
I have another idea how one could 'fake' such a functionality by looking for exact matches for added/deleted blocks of a certain length, which would probably work in many instances, but even implementing this requires changing several parts of the very core of latexdiff.
So not something I will undertake any time very soon

@ftilmann ftilmann added Medium and removed Low labels Jan 16, 2020
@flying-sheep
Copy link

do you know of an algorithm implemented in perl that does fuzzy differencing of tokenized text?

Sorry, I looked into Perl once in 2009 and decided to learn Python instead.

Doing what you said won’t be any more or less fake than what any diff tool does, they’re heuristics by necessity.

@apYdr6uxv
Copy link
Contributor

FWIW Found one implemented in JS here.
Apparently the algorithm is called the Heckel method; read more here.

Or am I off-track?

@ftilmann
Copy link
Owner

Thanks for leaving these hints. It looks like a promising approach but would replace the current diffing algorithm (at least optionally) and thus require quite a lot of coding to implement within the latexdiff context.

@apYdr6uxv
Copy link
Contributor

Of course! I did not mean to imply it makes it any easier. 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants