-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand manifest validation for all settings #4377
Comments
Background: rust-lang#4268 This diff takes us to **Stage 1.1** of the migration plan by allowing glob patterns to include a leading slash, so that glob patterns can be updated, if needed, to start with a slash, closer to the future behavior with gitignore-like matching. Why is this stage needed? It's common to have `package.include` set like this: ``` include = ["src/**"] ``` In old interpretation, this would only include all files under the `src` directory under the package root. With the new interpretation, this would match any path with some directory called `src`, even if it's not directly under the package root. After this patch, package owners can start marking glob patters with a leading slash to fix the warning thrown, if any. One thing to notice here is that there are no extra matchings, but, if a manifest has already a pattern with a leading slash, this would silently start matching it with the paths. I believe this is fine, since the old behavior would have been for the pattern to not match anything, therefore the item was useless. See also <rust-lang#4377> for suggestion to throw warning on useless/invalid patterns in these fields.
Seems plausible to me! All this validation happens on crates.io though which I'd prefer to not duplicate, although we can perhaps do simple things like validate |
[sources/path] Support leading slash in glob patterns Background: #4268 This diff takes us to **Stage 1.1** of the migration plan by allowing glob patterns to include a leading slash, so that glob patterns can be updated, if needed, to start with a slash, closer to the future behavior with gitignore-like matching. Why is this stage needed? It's common to have `package.include` set like this: ``` include = ["src/**"] ``` In old interpretation, this would only include all files under the `src` directory under the package root. With the new interpretation, this would match any path with some directory called `src`, even if it's not directly under the package root. After this patch, package owners can start marking glob patters with a leading slash to fix the warning thrown, if any. One thing to notice here is that there are no extra matchings, but, if a manifest has already a pattern with a leading slash, this would silently start matching it with the paths. I believe this is fine, since the old behavior would have been for the pattern to not match anything, therefore the item was useless. See also <#4377> for suggestion to throw warning on useless/invalid patterns in these fields.
Some more background and my rational for this proposal: The other day, for a new release for UNIC, I was updating a dozen of manifests, mostly their metadata. Since it's not vise to package/publish with dirty repository, I had to commit after each publish error, and try publishing again. This resulted in a handful of commits just catching all the small nits from the publish command, like "don't use space in the keywords list". Agreed that we better have one source of truth. However, I also believe we can benefit a lot by moving this one source of truth to a library that can be used in client side (cargo CLI, clippy, etc), as well as server side (crates.io, etc). I call it the What do you think? |
rust-lang/crates.io#7250 is also related to this (see point 2.) |
[sources/path] Support leading slash in glob patterns Background: #4268 This diff takes us to **Stage 1.1** of the migration plan by allowing glob patterns to include a leading slash, so that glob patterns can be updated, if needed, to start with a slash, closer to the future behavior with gitignore-like matching. Why is this stage needed? It's common to have `package.include` set like this: ``` include = ["src/**"] ``` In old interpretation, this would only include all files under the `src` directory under the package root. With the new interpretation, this would match any path with some directory called `src`, even if it's not directly under the package root. After this patch, package owners can start marking glob patters with a leading slash to fix the warning thrown, if any. One thing to notice here is that there are no extra matchings, but, if a manifest has already a pattern with a leading slash, this would silently start matching it with the paths. I believe this is fine, since the old behavior would have been for the pattern to not match anything, therefore the item was useless. See also <#4377> for suggestion to throw warning on useless/invalid patterns in these fields.
This is a painpoint that happens often to me. Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish? I would love to work on this. |
The solution should come from a discussion with crates.io team. I don't have the answer to it, but Zulip |
In our previous crates.io team meeting, we expressed concerns regarding the potential performance issues that could arise from introducing this validation to the Thus, we're inclined first to unify the validation rules of Cargo and crates.io. This would be a good initial step, and I am currently working on this. |
Even if the rules are unified, a divergence between crates.io and cargo can occur if the cargo version is outdated. I fail to see how uploading a bit of metadata via some simple API request is a problem. Admittedly I do not know how the API looks like at all. |
In going through cargo's triage (or talking to people about it), I remember seeing somewhere that some differences were intentional. An alternative route to go, once we have #12235, is to have the crates-io specific validation rules in a crate and a cargo linter would expose lints for crates-io and "all registries" published packages that don't conform to those validation rules. During the times where crates-io is out of sync with someone's MSRV, they can allow certain lints. |
For the publish workflow, crates.io has been shifting away from "trust the metadata" to "parse For example,
|
Yeah I see that sending just an extract might be error prone. Also, upon further reflection, I think just syncing the two checkers or at least having a crates.io-oriented validation crate would be enough. For people that do not want to keep up with the release cadence of cargo it will fail, but for everyone else this is perfect :) |
One thought on how to solve this
One question is when should these lints fire. With In the lint, we should
|
There are some practical difficulties arising from the fact that manifest files are not always validated. I want to collect such cases, so we can improve the user experience, specially for newbies, by showing warnings about bad manifest entries, as soon as possible.
One important fact here is that we almost never want to throw an error on invalid entries, because it will affect forward-compatibility: would make many future changes non-backward-compatible.
Some Examples
I think almost every config item can have some sort of validation. This list only presents possible validation methods for different types of values.
package.version
: The version is not already present (Fail fast when package version is already published #3662).package.keywords
: Having invalid keywords strings only surface during packaging/publishing. Seecargo publish
should warn on invalid categories/keywords #4300 for details.package.categories
: This is a bit harder, because the list is maintained incrates.io
repo. But, like any other index-related data, it can be fetched, cached, and checked against.package.include
,package.exclude
,workspace.exclude
,workspace.members
: These are configs with pattern-matching and warning on invalid patterns can help users track down packaging issues easier and faster, and make changes in those areas easier. (See Change Cargo include/exclude rules to gitignore patterns #4268)package.readme
and other file paths:Need to check the existence of the file, if linked(see fix(toml): Convert warnings thatlicence
andreadme
files do not exist into errors #13921). In addition, we can also warn on missing the config, if a best-guess file is present (similar to Misspelling ofbuild.rs
doesn't get reported #13073 since we already infer the file if a "well known" name is used).package.homepage
and other URLs: Perform URL validation and check URLs against the newly-implemented blacklist. (blacklist is maintain incrates.io
, so this would be another index-dependent check.)features.<name>
Cargo feature name validation inconsistent with crates.io #5554Banning of wildcard dependencies --dry-run does not validate dependencies #5941
The text was updated successfully, but these errors were encountered: