Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: better detection of commits that were made empty #2979

Open
jyn514 opened this issue Feb 7, 2024 · 8 comments
Open

FR: better detection of commits that were made empty #2979

jyn514 opened this issue Feb 7, 2024 · 8 comments
Labels
polish🪒🐃 Make existing features more convenient and more consistent

Comments

@jyn514
Copy link
Contributor

jyn514 commented Feb 7, 2024

Is your feature request related to a problem? Please describe.
i have a local change that was merged into main as 52f4fb1. i still have that change locally, and jj does not think it is a parent of main@origin (cc #2978). if i try to rebase it with jj rebase -d main@origin unqytsx, i get a merge conflict:

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4884a7383a...0000000000 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -87,12 +87,18 @@
 
 * `jj workspace root` was aliased to `jj root`, for ease of discoverability
 
+<<<<<<<
++++++++
 * `jj diff` no longer shows the contents of binary files.
 
 * `jj git` now has an `init` command that initializes a git backed repo.
 
 * New template function `surround(prefix, suffix, content)` is added.
 
+%%%%%%%
++* `jj diff` no longer shows the contents of binary files.
++
+>>>>>>>

this is very silly; the line it's trying to add is already present in the file, and it was added before the "jj git" line (before in history, the fact it's the line above shouldn't be relevant). so jj should notice that it's empty instead of thinking it's a conflict.

Describe the solution you'd like
jj rebase should notice the change is now empty and i should end up with a commit with the same description as unqytsx but an empty diff. ideally that change itself should then be dropped (#229), but that's separate from this issue.

Describe alternatives you've considered

Additional context
i am pretty sure this "just works" with git, although i admittedly haven't tested.

@martinvonz
Copy link
Member

I think you're right that git does this better. It calculates a "patch id" (basically a hash of a context-less diff output, IIRC) for the commits to rebase and for the upstream commits. Then it skips rebasing the commits that match a patch in upstream. I have been thinking of doing something similar in the planned jj git sync. Maybe we should add an option for that to jj rebase too.

@PhilipMetzger PhilipMetzger added the polish🪒🐃 Make existing features more convenient and more consistent label Feb 7, 2024
@jyn514
Copy link
Contributor Author

jyn514 commented Feb 9, 2024

sounds great :)

I have been thinking of doing something similar in the planned jj git sync. Maybe we should add an option for that to jj rebase too.

slightly offtopic, but i'm strongly in favor of not blocking this on a new jj sync command. i've seen a couple different comments along the lines of "this looks cool, we'll put it in jj sync" and it reminds me of other projects where work has been delayed for months or years because it's part of a larger feature that never gets implemented.

that's not to say i don't think jj sync will get implemented, but i think it would benefit greatly from being scoped down, at least at first.

@ilyagr
Copy link
Contributor

ilyagr commented Feb 10, 2024

My ideal solution would be to create a conflict theory that can keep track of this, so that all conflicts are simplified better. This may be easier said than done, so jyn's remark about scoping down applies to this idea as well.

@colemickens
Copy link
Contributor

I think you're right that git does this better. It calculates a "patch id" (basically a hash of a context-less diff output, IIRC) for the commits to rebase and for the upstream commits. Then it skips rebasing the commits that match a patch in upstream. I have been thinking of doing something similar in the planned jj git sync. Maybe we should add an option for that to jj rebase too.

This also sounds like this would address behavior I'm seeing:

  1. My nixpkgs custom branch has empty commits as markers to separate chunks of commits (some are already upstream, some are personal patches meant to stay that way, some are specifically cross-build fixes, etc)
  2. Today I rebased and some of the commits in my "upstream" section turned empty as they had landed in the branch that I rebase onto.
  3. So I re-ran the rebase with --skip-empty, but of course this removed my nice spacer commits as well.

Also, I didn't see it mentioned, but if this smarter "skip commit already in rebase target (?)" logic is implemented, it would be nice if it were the default behavior.

@scott2000
Copy link
Contributor

I've been thinking about this for a while since I've run into this a few times, and I wonder if jj can use a simpler approach than Git. The algorithm I'm thinking of is this:

If commits S are being rebased onto destination D:

  1. Find commits which are present in ::D but not in ::S. These are commits that might have duplicates in S.

  2. Build a HashMap<Signature, Vec<CommitId>> based on the author signature for each commit. Since rebased commits should have the same author and author timestamp as the original commit, we should only have to compare commits with the same author signature.

  3. Whenever a commit X from S is rewritten, check whether its author signature is in the HashMap. If it isn't, X should be kept.

  4. If the author signature was present in the map, then filter the Vec<CommitId> based on the description of the commit. If no commits have the same description as X, then X should be kept. (This step may not be necessary, but I think it would make it more efficient if there are several commits with the same author timestamp for some reason. We might need to be careful about trailers being added though, so I'm not sure if it's safe to filter by description.)

  5. For each remaining commit Y with the same author signature and description as X, compute the tree which would result from rebasing X onto Y-. If the resulting tree is the same as Y, then X can be discarded since it's a duplicate of Y.

This seems simpler to me since the patch IDs wouldn't need to be computed for every commit, which could be expensive, and we wouldn't need to cache the patch IDs anywhere. It does rely on the assumption that the author and author timestamp are the same after rebase, but that feels pretty safe to me.

@yuja
Copy link
Contributor

yuja commented Jul 29, 2024

It might be simpler to just rely on patch id (or change id if the backend or forge natively supports it.) Computing hash of diffs wouldn't be expensive compared to actual rebasing. We can of course add some heuristics based on signature, though.

@scott2000
Copy link
Contributor

The main reason I was thinking that actually checking the rebased trees match could be better is that it might be correct in more scenarios. Specifically if there's a situation where A - B + A is simplified to A in a hunk of one of the files, that would cause the patch ID to be different after rebasing (if I'm understanding it correctly), so the patch ID approach would think the commits are different when they actually are the same. However if Git doesn't handle this case either, then maybe it's not common enough to matter.

@jyn514
Copy link
Contributor Author

jyn514 commented Jul 29, 2024

please do not base this off author + commit message. i would like to be able to rebase other people's branches, and i would like to edit my commit messages, and breaking a core feature like this when that happens is very unexpected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
polish🪒🐃 Make existing features more convenient and more consistent
Projects
None yet
Development

No branches or pull requests

7 participants