Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beta: cargo metadata uses inconsistent encodings of ids with branch = master dependencies #8756

Closed
Nemo157 opened this issue Oct 5, 2020 · 6 comments · Fixed by #8824
Closed
Labels
C-bug Category: bug regression-from-stable-to-beta Regression in beta that previously worked in stable.

Comments

@Nemo157
Copy link
Member

Nemo157 commented Oct 5, 2020

Problem
When encoding ids of git dependencies using branch = "master" the encoding used in the packages list and the resolve list can be inconsistent on 1.47-beta. This is a regression in this output from 1.46 causing issues in cargo-* plugins that try to use the resolve graph from cargo metadata.

Steps

> git clone https://github.com/Monadic-Cat/mbot
Cloning into 'mbot'...
[...]
> cd mbot && git checkout 0921041f531eaee1a6151253a75cf2ad32626cc0
HEAD is now at 0921041 support embedding whole current dependency list in binaries

> cargo +stable metadata --format-version 1 | jq -r '(.packages[] | select(.name == "mice") | .id), (.resolve.nodes[].deps[] | select(.name == "mice") | .pkg)'
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)

> cargo +stable --version -v
cargo 1.46.0 (149022b1d 2020-07-17)
release: 1.46.0
commit-hash: 149022b1d8f382e69c1616f6a46b69ebf59e2dea
commit-date: 2020-07-17


> cargo +beta metadata --format-version 1 | jq -r '(.packages[] | select(.name == "mice") | .id), (.resolve.nodes[].deps[] | select(.name == "mice") | .pkg)'
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice?branch=master#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)

> cargo +beta --version -v
cargo 1.47.0-beta (f3c7e066a 2020-08-28)
release: 1.47.0
commit-hash: f3c7e066ad66e05439cf8eab165a2de580b41aaf
commit-date: 2020-08-28
@Nemo157 Nemo157 added the C-bug Category: bug label Oct 5, 2020
@Eh2406 Eh2406 added the regression-from-stable-to-beta Regression in beta that previously worked in stable. label Oct 5, 2020
@Nemo157
Copy link
Member Author

Nemo157 commented Oct 5, 2020

Running cargo +beta update -p mbot among other changes results in updating the specification in Cargo.lock (for some reason cargo +beta update -p mice doesn't):

@@ -1020,7 +1026,7 @@ checksum = "3728d817d99e5ac407411fa471ff9800a778d88a24685968b36824eaf4bee400"
 [[package]]
 name = "mice"
 version = "0.9.0"
-source = "git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c"
+source = "git+https://github.com/Monadic-Cat/mice?branch=master#264dd2d713e232709588bac3ad3ffef0aeb7eb2c"
 dependencies = [
 "nom",
 "rand 0.7.3",

which then results in consistent encoding again:

> cargo +beta metadata --format-version 1 | jq -r '(.packages[] | select(.name == "mice") | .id), (.resolve.nodes[].deps[] | select(.name == "mice") | .pkg)'
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice?branch=master#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice?branch=master#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)

but running cargo +stable metadata will undo this change, gcing back to inconsistent encoding:

> cargo +stable metadata --format-version 1 | jq -r '(.packages[] | select(.name == "mice") | .id), (.resolve.nodes[].deps[] | select(.name == "mice") | .pkg)'
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)

> cargo +beta metadata --format-version 1 | jq -r '(.packages[] | select(.name == "mice") | .id), (.resolve.nodes[].deps[] | select(.name == "mice") | .pkg)'
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice?branch=master#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)
mice 0.9.0 (git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c)
@@ -1026,7 +1026,7 @@ checksum = "3728d817d99e5ac407411fa471ff9800a778d88a24685968b36824eaf4bee400"
 [[package]]
 name = "mice"
 version = "0.9.0"
-source = "git+https://github.com/Monadic-Cat/mice?branch=master#264dd2d713e232709588bac3ad3ffef0aeb7eb2c"
+source = "git+https://github.com/Monadic-Cat/mice#264dd2d713e232709588bac3ad3ffef0aeb7eb2c"
 dependencies = [
  "nom",
  "rand 0.7.3",

@ehuss
Copy link
Contributor

ehuss commented Oct 21, 2020

@alexcrichton I think this is related to #7841 and #8101. There are some oddities with how SourceMap works. In this case, the SourceId in SourceMap is initialized from the Dependency which is SourceKind::Git(GitReference::DefaultBranch). This SourceId is the id that ends up in the PackageId for packages in the PackageSet which gets displayed in the "packages" array.

However, PackageIds in the resolver have a SourceId of SourceKind::Git(GitReference::Branch("master")) because it was deserialized with that specific branch name.

I'll try to investigate some more and think if all these issues can be untangled somehow. The fact that multiple SourceId's map to the same SourceId in SourceMap seems to be the root of all of these, though I'm uncertain on the path to fix them. If you have any ideas, let me know, otherwise I'll keep digging. Is there maybe some way the SourceId can be "upgraded" to the more specific variant inside the Package struct?

@ehuss
Copy link
Contributor

ehuss commented Oct 23, 2020

Just an update on my investigation:

New lock files include ?branch=master
Starting in 1.47, if you have an explicit branch = "master" in Cargo.toml, and you generate a new lock file, the lock file will end up with ?branch=master in the source field. I'm uncertain if this was intentional. There is code to explicitly avoid that, but that code is only used for dependencies, and not for the package.source field. I feel like that seems wrong, and that should also apply to all sources.

Mixed lock/toml settings cause inconsistencies
If you have a Cargo.toml with branch = "master", and an old lock file (pre 1.47), the lock file will not have ?branch=master. This inconsistency is exposed whenever SourceIds are displayed (such as in cargo metadata). In this case, the "resolve" field will not contain ?branch=master in the URLs, but the "packages" array will. This happens because the SourceId for Packages is generated from the dependency declaration (which has branch = "master"), but the Cargo.lock file doesn't have them.

Conversely, if you have a (new) Cargo.lock that contains ?branch=master, and you decide to remove branch = "master" from Cargo.toml, you end up in the opposite situation. The resolve entries will have ?branch=master, and the packages will not.

I've been trying to think of a way to handle this, but haven't gotten too far, and it is getting late. My best idea is that cargo metadata should normalize the "resolve" entries to match whatever is in "packages". That might solve the cargo metadata situation, but I'm still concerned that resolve PackageIds might get exposed in other situations. I think all other scenarios (like JSON messages) come from Package, but there is some risk that a resolve-based SourceId might be exposed somehow.

An alternative, similar to what Alex alluded to in #8101 (comment), is to somehow "sync" the SourceIds in the Resolve after it has been loaded to match what is in the Package objects. That sounds like it would be more reliable, but I'm uncertain how feasible it is.

@alexcrichton
Copy link
Member

Sorry I haven't been able to investigate this more. It was somewhat expected that branch=master and not would be somewhat of a "soup" internally in Cargo, but the hope was that it wouldn't actually cause user-facing problems (except where warnings are emitted). Is cargo metadata the only location for issues expected? If so it seems reasonable to apply a specific fix there to just always drop branch=master for now perhaps?

@ehuss
Copy link
Contributor

ehuss commented Oct 23, 2020

I think cargo metadata is the only place where this surfaces, but I am not certain. I think it only matters for JSON interfaces, and JSON artifact messages use the Package variant of PackageId. It only matters if you want to compare the IDs between the resolver and Packages, and I think that only happens in cargo metadata. Does that sound right?

@alexcrichton
Copy link
Member

That's my suspicion yeah. I'm hoping a somewhat surgical fix to only metadata/json output is necessary, and we don't have to worry too much about the "soup" of whether lockfiles have branch=master or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug regression-from-stable-to-beta Regression in beta that previously worked in stable.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants