Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New rustc and Cargo options to allow path sanitisation by default #3127

Merged
merged 32 commits into from
May 13, 2023

Conversation

cbeuw
Copy link
Contributor

@cbeuw cbeuw commented May 23, 2021

@ehuss ehuss added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label May 24, 2021
@joshtriplett
Copy link
Member

joshtriplett commented May 25, 2021

This seems reasonable to me, including the step of (eventually) making this default to enabled in release builds. That'll allow people to turn it on in debug if they need to ship debug builds, or conversely, turn it off in release mode if they have a specific need for absolute paths in release builds.

@Evrey
Copy link

Evrey commented May 28, 2021

Finally, soon i can remove my 2km long --remap-path-prefix entries from $RUSTFLAGS, neat!

Some interactions with compiler-intrinstic macros need to be considered, though these are entirely down to `rustc`'s implementation of
`--remap-path-prefix`:
1. Path (of the current file) introduced by [`file!()`](https://doc.rust-lang.org/std/macro.file.html) *will* be remapped. **Things may break** if
the code interacts with its own source file at runtime by using this macro.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want an example of things breaking to link to: rust-num/num-traits#139

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just not apply remapping when building build.rs, since it'll be up to the build script to maintain privacy and reproducibility anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If file!() is relative to CARGO_MANIFEST_DIR, then build.rs should be able to fix these paths, and it's not even difficult: manifest_dir.join(relative_or_absolute_file_path) just works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix in num was to just emit a relative path, so I assume Cargo already treats relative paths as relative to CARGO_MANIFEST_DIR (though I see that isn't documented on https://doc.rust-lang.org/nightly/cargo/reference/build-scripts.html#rerun-if-changed).

E.g. `/home/username/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs` ->
`/rustc/1.52.1/library/core/src/result.rs`
2. Path to the working directory will be replaced with `.`. E.g. `/home/username/crate/src/lib.rs` -> `./src/lib.rs`.
3. Path to packages outside of the working directory will be replaced with `[package name]-[version]`. E.g. `/home/username/deps/foo/src/lib.rs` -> `foo-0.1.0/src/lib.rs`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somewhat assumes uniqueness of a crate name + version, but it's possible to have two crates with the same name and version being included in the build from different registries, or git repos, or path sources (though cargo has issues with some overlaps).

foo v0.1.0 (/tmp/tmp.NPvWf4JOsU/foo)
├── futures-core v0.3.15 (/tmp/tmp.NPvWf4JOsU/bar)
└── futures-core v0.3.15

One option would be to use the "hashed index url" as a leading directory, so in this case assuming foo is from crates.io it would be github.com-1ecc6299db9ec823/foo-0.1.0/src/lib.rs. Git dependencies similarly have a "hashed git url" that could be used, e.g. futures-rs-b0bea7d4c3745ece for https://github.com/rust-lang/futures-rs.

Path dependencies I'm not sure about, it could be possible to generate a similar hash based on the actual path, but while that would alleviate privacy issues it would still have reproducibility problems. It's (currently) not possible to have two path dependencies with overlapping name+version, so maybe just a leading segment such as path/ to distinguish them from non-path dependencies would work?

error: package collision in the lockfile: packages bar v0.1.0 (/tmp/tmp.NPvWf4JOsU/bar) and bar v0.1.0 (/tmp/tmp.NPvWf4JOsU/bar2) are different, but only one can be written to lockfile unambiguously

Alternatively, this could be declared as a non-issue since it will probably never actually occur, but that should be documented so when it does happen to someone they have a better chance of figuring out what's happening.

Copy link
Contributor Author

@cbeuw cbeuw May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced issue rust-lang/rust#40552 has a more detailed remapping scheme

  • For code from crates.io or mirrors, the root name is [crates.io]. Example:
    C:\Users\username.cargo\registry\src\github.com-1ecc6299db9ec823\winapi-0.2.8\ will be mangled to [crate.io]\winapi-0.2.8\
  • For code from remote git repository, the root name is [username@git server name]. Example: https://github.com/rust-lang/rust will be manged to [rust-lang@github.com]/rust
  • For code from local filesystem, the root name is [local]. Example: D:\workspaces\foobarng\ will be mangled to [local]\foobarng\
  • For code from the crate itself, the root name is crate. Example: C:\Users\username\Documents\foobar\ will be mangled to [crate]\foobar\

We could take a similar approach here.

There were some discussion about what happens when two paths are mapped to the same thing by --remap-path-prefix: rust-lang/rust#83813 (comment). For reproducibility reasons the stable hash (used to generate the stable crate hash) of a source path uses the remapped path if available, so there is a chance of collision. There were some discussion around if we should simply error out when this happens. We probably should have this in mind here.

@BurntSushi
Copy link
Member

BurntSushi commented May 28, 2021

With regard to one of the stated drawbacks:

With trim-path enabled, if the debug option is simultaneously not false (it is turned off by default under release profile), paths in debuginfo will also be remapped. Debuggers will no longer be able to automatically discover and load source files outside of the working directory. This can be remidated by debugger features remapping the path back to a filesystem path.

This seems like a fairly concerning drawback to me. I do a lot of profiling with binaries compiled in release mode, and find it extremely useful that the source code is displayed in the profiling tools. It sounds to me like what this is saying is that workflow will break, and I will have to know to disable trim-path to get it working again. The fix itself isn't too bad. It's mildly annoying, but as long as you know what the fix is, it's probably worth the extra annoyance to gain some added privacy by default. My thinking is that discovering that one needs to disable this option to get debuggers and profiling tools to show you source code might be quite difficult though. There is no obvious point in the process of debugging or profiling that will point you towards trim-path. And in particular, it is later said that

removing absolute paths for all build would be a hassle for debugging

But debugging a release binary is not that uncommon in my experience. So it sounds to me like we are making that experience worse by default. I'm not sure that's the right trade off to make, although it is somewhat murky.

@kornelski
Copy link
Contributor

kornelski commented May 28, 2021

Would it be possible to separate trimming of paths inside executables from trimming of paths in external debug files?

I distribute executables, but keep dSYM bundles private. This way I can still symbolicate and debug crashes, but I'm not shipping bloated executables or exposing symbol information in the executable itself.

Trimming of paths in the executables, like paths from panic!/file! would be great — it'd improve privacy and reduce bloat. And keeping full absolute paths in dSYM would keep maximal compatibility with debuggers, profilers, etc.

@kornelski
Copy link
Contributor

I know some people use --release flag during development just because the debug code is too slow to run, or it doesn't make sense to profile unoptimized code. But this could be fixed by either educating users about opt-level, and/or maybe adding a convenience --debug-opt flag.

@joshtriplett
Copy link
Member

@kornelski

Would it be possible to separate trimming of paths inside executables from trimming of paths in external debug files?

That seems like a really good plan. We could start by having trim-path available, and then enable trim-path by default on platforms where we have separated debug symbols by default.

People can also still enable it themselves, and we can consider the tradeoffs for enabling it by default in release mode if not generating debug symbols.


1. Path to the source files of the standard and core library will begin with `/rustc/[rustc version]`.
E.g. `/home/username/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs` ->
`/rustc/1.52.1/library/core/src/result.rs`
Copy link
Contributor

@teor2345 teor2345 May 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"x86_64-unknown-linux-gnu" seems like useful information.

It isn't sensitive info, and it probably won't break reproducibility, because changing the target will probably change the binary in other ways.

Can we add the target to the compiler version?
Would it also help to add the rustc commit hash to the compiler version?
(We don't want the build date, because that would break reproducibility.)

text/3127-trim-path.md Outdated Show resolved Hide resolved
text/3127-trim-path.md Outdated Show resolved Hide resolved
```

With `trim-path` option enabled, the compilation process will not introduce any absolute paths into the build output. Instead, paths containing
certain prefixes will be replaced with something stable by the following rules:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules don't seem to cover modules with the #[path] attribute which might point to a file outside the working directory.

https://doc.rust-lang.org/reference/items/modules.html#the-path-attribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the path is still inside the current project directory (where Cargo.toml lives), it will still be caught by the "Path to packages outside of the working directory" rule. If it points to some arbitrary location on the local file system then we can't reasonably expect Cargo to automatically sanitise it.

Co-authored-by: teor <teor@riseup.net>
@joshtriplett
Copy link
Member

Something that came up in today's @rust-lang/cargo meeting: we should set a variable CARGO_TRIM_PATH or similar for build.rs scripts.

text/3127-trim-path.md Outdated Show resolved Hide resolved
text/3127-trim-path.md Outdated Show resolved Hide resolved
text/3127-trim-path.md Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member

Reading over this one idea I had is that this could be a nice opportunity to leverage community norms and such to make "linkifying" backtraces more easy. For example given a path coming out of a Rust backtrace it'd be awesome to be able to generate an HTTP URL to the exact copy of the source code (with a line number) that the code references for some paths. An RFC would be, in theory, a great place to specify that two canonical sources of Rust code have well-defined paths in "remapped debuginfo situations":

  • All rustc source code comes from /rustc/$rustc_sha/.... This is somewhat grandfathered in but we could presumably change rustc at any time too if we want.
  • Crates.io-sourced code could be /cratesio/$crate/$version/... (or something like that).

I think it would be a lot more difficult to automatically translate paths to URLs for other sources of code (e.g. general git repositories, path dependencies, etc), but if we could specify what happens to rustc and crates.io-sourced code, that'd be nice! We could then leave a different default remapping source for all other crates that don't fall in the crates.io category, such as /src/$crate/$version/... or something like that.

Also, when reading this RFC, I think that we'll also want --remap-path-prefix for OUT_DIR-included source code. This means that for packages that have a build script, and we could set the path to something like /out-dir/$crate/$version by default (or something similar).

cbeuw and others added 2 commits June 1, 2021 21:06
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
@cbeuw
Copy link
Contributor Author

cbeuw commented Jun 1, 2021

@alexcrichton One thing I'm slightly annoyed about a path-like prefix (which was proposed in the pre-RFC) is the differences between path separators on Windows vs *nix. That's purely aesthetic and we could format it differently depending on the host, or we could simply not bother and say /cratesio/$crate/$version/src\foo.rs looks fine (or take the average and use |cratesio|$crate|$version but that's an abomination)

Regarding linking sysroot and crates.io sources to a URL, I assume that's something the backtrace library can do at runtime?

I'm not quite sure how OUT_DIR works. Is the compiled build script placed under it to be run?

@cbeuw cbeuw changed the title A Cargo profile option trim-path to sanitise absolute paths A Cargo profile option trim-paths to sanitise absolute paths Jun 1, 2021
@alexcrichton
Copy link
Member

My general idea is that any project which wants to linkify backtraces in Rust code can automatically linkify Rust standard library source code and dependencies coming from crates.io since those are hosted at canonical locations. Cargo is uniquely positioned with a flag like this to set precedent for how everyone should do this, so everyone canonicalizes around the same way that filenames look.

For issues like path separators, we'd canonicalize on one style. For windows/unix, we'd still just canonicalize on one style (the files won't exist locally anyway). Some of this may require rustc support rather than "purely just a flag from Cargo", but my point is more general in that this is an opportunity ripe for the picking in an RFC like this, I think.

Note that I'm not expecting libraries like backtrace to like download sources at crash time, but rather that offline systems which symbolize crashes with Rust stack frames will better be able to link to code which is located in crates.io or Rust libstd dependencies.

For OUT_DIR, it's an environment variable set during crate compilation per-crate to a directory buried within target/.... This is only set when the crate has a build script, and the assumption is that a build script can place rust source code in OUT_DIR (set during the build script's execution) which can then be include!'d as a form of code generation of Rust code.

@cbeuw
Copy link
Contributor Author

cbeuw commented Jun 27, 2021

Coming back to this after a month of busy uni stuff -

GCC and Clang both have three separate path remapping flags:

  • -fdebug-prefix-map affects only debuginfo,
  • -fmacro-prefix-map affects only __FILE__-like macro expansions,
  • -ffile-prefix-map is the combination of both of above. This is what rustc's --remap-path-prefix does today.

Seeing the feedbacks to this RFC, I think people want granular treatments to the two different types of paths (debuginfo vs macro expansion) similar to what GCC and Clang has been doing for a while.

If rustc were to have both --remap-debuginfo-prefix and --remap-macro-prefix, trim-paths would be able to provide three-levels of behaviour

  • off: No remapping
  • macro expansions only: Remap only file!() macro
  • full: Remap file!() macro and debuginfo paths. Do what --remap-path-prefix currently does.

And the defaults under release profile would depend on the setting of split-debuginfo (whether split-debuginfo is explicitly set or implicitly set by platform-dependent defaults):

  • off: Full trim-paths because debuginfo would be embedded in the executable
  • packed ("all the debug information is packed into a separate file from the main executable"): Macro expansions-only trim-paths
  • unpacked: Full trim-paths on macOS but macro expansions-only on other Unix platforms (this option is not supported on Windows)

This way only the paths that will be contained in the binary will be affected, leaving debuginfo potentially untouched unless they are also embedded in the binary.

I think this is nice and neat and should be the end product we are looking for. But --remap-debuginfo-prefix and --remap-macro-prefix don't exist yet. So the remaining question I have is whether there is a way to implement an MVP for trim-paths without making major changes to rustc flags that will be compatible with the shiny future. Or maybe this is not worth it and we should just introduce the two flags first.

@ghost
Copy link

ghost commented Jul 9, 2021

cant tell from thread.

will be explicit opt out (always trim in --release) or opt in (always need flag)? seems most add trim-path in go anyway so maybe make sense to be default in --release and not need extra flag always. except if paths are want for some reason

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 16, 2023
Heuristically undo path prefix mappings.

Because the compiler produces better diagnostics if it can find the source of (potentially remapped) dependencies.

The new test fails without the other changes in this PR. Let me know if you have better suggestions for the test directory. I moved the existing remapping test to be in the same location as the new one.

Some more context: I'm exploring running UI tests with remapped paths by default in rust-lang#105924 and this was one of the issues discovered.

This may also be useful in the context of rust-lang/rfcs#3127 ("New rustc and Cargo options to allow path sanitisation by default").
bors added a commit to rust-lang/miri that referenced this pull request Jan 23, 2023
Heuristically undo path prefix mappings.

Because the compiler produces better diagnostics if it can find the source of (potentially remapped) dependencies.

The new test fails without the other changes in this PR. Let me know if you have better suggestions for the test directory. I moved the existing remapping test to be in the same location as the new one.

Some more context: I'm exploring running UI tests with remapped paths by default in rust-lang/rust#105924 and this was one of the issues discovered.

This may also be useful in the context of rust-lang/rfcs#3127 ("New rustc and Cargo options to allow path sanitisation by default").
@cbeuw
Copy link
Contributor Author

cbeuw commented Feb 8, 2023

@ehuss Apologies for the silence. I've been pretty busy over the past 6 months but now I'm freed up a lot more, so I'll pick this up again and respond more promptly so people don't have to build up contexts repeatedly.

Regarding the rationale on the available options, I've simply bucketed all sources of absolute paths into several composable groups. They are very granular for the purpose of use-case discovery. I cannot provide use-cases on all of the individual options at this stage because I don't know them yet. I did make it clear in the RFC that not all options are aiming to be stabilised. In fact, I fully anticipate most of the *-debuginfo options to be removed and merged into other options when we stabilise this. I could simply remove them from the RFC for the sake of making the RFC process easier, but then I'll have to go through change processes again if some users do want more granular control. The point here is that simple things are simple and complex things are possible, so the docs inevitably have to document the complex things, but the most users should not have to deal with anything other than none, object and all.

I'll continue the discussion regarding CARGO_TRIM_PATHS in the thread above.

@ehuss
Copy link
Contributor

ehuss commented Feb 8, 2023

I cannot provide use-cases on all of the individual options at this stage because I don't know them yet.

I'm not going to block on this point, but I think it is strange to propose changes that we expect to discard. For example, if there are no use cases for diagnostics, not even any we can envision, why go through the effort of implementing that, adding complexity to the compiler, just to be something that is not used?

I would ask that when possible, add a sentence as to why the option exists. Perhaps in the form "This option would be used for situations when …". It looks like macro already has an obvious explanation, but the others do not. From what's written, there is not enough clarity to me to understand why one option would be used and not another. If there is no reason, then perhaps state that ("diagnostics does not currently have a use case, and is only being included for completeness").

@cbeuw
Copy link
Contributor Author

cbeuw commented Feb 20, 2023

Nudge @Eh2406 @michaelwoerister @nikomatsakis @pnkfelix @wesleywiser for concerns or approval. The voting comment already got collapsed #3127 (comment) 😅


- `macro` - apply remappings to the expansion of `std::file!()` macro. This is where paths in embedded panic messages come from
- `diagnostics` - apply remappings to printed compiler diagnostics
- `unsplit-debuginfo` - apply remappings to debug information only when they are written to compiled executables or libraries, but not when they are in split debuginfo files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidtwco, do you know if we can actually implement this (when using the LLVM backend)? I.e. can we control what paths show up in what context? As far as I know, we produce a single LLVM metadata description and then LLVM takes care of splitting things apart, right?

@michaelwoerister
Copy link
Member

Checked my box. Thanks for keeping at this so persistently, @cbeuw!

@apiraino
Copy link

apiraino commented May 3, 2023

Nominating this RFC for T-compiler, let's reload the context here and see if it can be approved (voting comment)

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels May 3, 2023
@rfcbot
Copy link
Collaborator

rfcbot commented May 3, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels May 13, 2023
@rfcbot
Copy link
Collaborator

rfcbot commented May 13, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@ehuss ehuss merged commit cd01428 into rust-lang:master May 13, 2023
@ehuss
Copy link
Contributor

ehuss commented May 13, 2023

Huzzah! The @rust-lang/compiler and @rust-lang/cargo teams have decided to accept this RFC.

To track further discussion, subscribe to the tracking issue here:
rust-lang/rust#111540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-cargo Relevant to the Cargo team, which will review and decide on the RFC. T-compiler Relevant to the compiler team, which will review and decide on the RFC. to-announce
Projects
No open projects
Status: Implemented
Development

Successfully merging this pull request may close these issues.