From f3cb7985242e43d6fd650bbb063bb5142ba4583f Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 8 Sep 2019 11:27:07 +0200 Subject: [PATCH 01/22] Create 0000-cargo-embed-dependency-versions.md --- text/0000-cargo-embed-dependency-versions.md | 126 +++++++++++++++++++ 1 file changed, 126 insertions(+) create mode 100644 text/0000-cargo-embed-dependency-versions.md diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md new file mode 100644 index 00000000000..f547de15af4 --- /dev/null +++ b/text/0000-cargo-embed-dependency-versions.md @@ -0,0 +1,126 @@ +- Feature Name: `cargo_embed_dependency_versions` +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Embed information equivalent to the contents of Cargo.lock into compiled binaries so it could be programmatically recovered later. + +# Motivation +[motivation]: #motivation + +Rust is very promising for security-critical applications due to its safety guarantees, but there currently are gaps in the ecosystem that prevent it. One of them is the lack of any infrastructure for security updates. + +Linux distributions alert you if you're running a vulnerable software version and you can opt in to automatic security updates. Cargo not only has no automatic update infrastructure, it doesn't even know which libraries or library versions went into compiling a certain binary, so there's no way to check if your system is vulnerable or not. + + + +The primary motivation is cross-referencing versions of the dependencies against [RustSec advisory database](https://github.com/RustSec/advisory-db). This also enables use cases such as making a fix in a library crate and then ensuring it's been rolled out to your entire fleet, or preventing binaries with unvetted dependencies from reaching production. + +Why are we doing this? What use cases does it support? What is the expected outcome? + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Explain the proposal as if it was already included in the language and you were teaching it to another Rust programmer. That generally means: + +- Introducing new named concepts. +- Explaining the feature largely in terms of examples. +- Explaining how Rust programmers should *think* about the feature, and how it should impact the way they use Rust. It should explain the impact as concretely as possible. +- If applicable, provide sample error messages, deprecation warnings, or migration guidance. +- If applicable, describe the differences between teaching this to existing Rust programmers and new Rust programmers. + +For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +This is the technical portion of the RFC. Explain the design in sufficient detail that: + +- Its interaction with other features is clear. +- It is reasonably clear how the feature would be implemented. +- Corner cases are dissected by example. + +The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. + +# Drawbacks +[drawbacks]: #drawbacks + +- Adds more platform-specific code to the build process which needs to be maintained. +- Slightly increases the size of the generated binaries. However, the increase is below 1%. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its Cargo.lock even with a couple of dependencies is less than 1Kb, that's under 1/1000 of the size of the binary. Since Cargo.lock grows linearly with the number of dependencies, it will keep being negligible. + +Why should we *not* do this? + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +Rationale: + +- Version information is impossible to misplace. As long as you have the binary, you can recover the info about dependency versions. The importance of this cannot be overstated. This allows auditing e.g. a Docker container that you did not build yourself, or a server that somebody's built a year ago and left no audit trail. +- A malicious actor could lie about the version information. However, doing so requires modifying the binary - and if a malicious actor can do _that,_ you are already pwned. So this does not create any additional attack vectors - other than exploiting the tool that's recovering the version information, which can be easily sandboxed. +- Any software supply chain verification that might be deployed automatically applies to the version information. There is no need to separately authenticate it. +- This enables third parties such as cloud providers to scan your binaries for you. Google Cloud [already provides such a service](https://cloud.google.com/container-registry/docs/get-image-vulnerabilities), Amazon has [an open-source project you can deploy](https://aws.amazon.com/blogs/publicsector/detect-vulnerabilities-in-the-docker-images-in-your-applications/) while Azure [integrates several partner solutions](https://docs.microsoft.com/en-us/azure/security-center/security-center-vulnerability-assessment-recommendations). + +Alternatives: + +- Do nothing. Identifying vulnerable binaries will remain impossible. +- Track version information separately from the binaries, recording it when running `cargo install` and surfacing it through some other Cargo subcommand. When installing not though `cargo install`, rely on Linux package managers to track version information. Identifying vulnerable binaries will remain impossible on all other platforms, as well as on Linux for code compiled with `cargo build`. Verification by third parties remains impossible. + +- Why is this design the best in the space of possible designs? +- What other designs have been considered and what is the rationale for not choosing them? +- What is the impact of not doing this? + +# Prior art +[prior-art]: #prior-art + +Discuss prior art, both the good and the bad, in relation to this proposal. +A few examples of what this can include are: + +- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? +- For community proposals: Is this done by some other community and what were their experiences with it? +- For other teams: What lessons can we learn from what other communities have done here? +- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. + +This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. +If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. + +Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. +Please also take into consideration that rust sometimes intentionally diverges from common language features. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- The format of Cargo.lock is not stabilized and is evolving. Should we encode Cargo.lock as-is and require tooling to track the updates, or commit to a stable subset of Cargo.lock? +- Should this also apply to shared libraries? +- Should this information be removed when stripping the binary of debug symbols? +- Are there any cases where you would _not_ want to allow whoever is running the binary to check it for vulnerabilities? + +Out of scope for now: + +- how to track and communicate versions of statically linked C libraries, such as OpenSSL? + +# Future possibilities +[future-possibilities]: #future-possibilities + +- Surface dependency information through an HTTP endpoint in a microservice environment. The [proof-of-concept](https://github.com/Shnatsel/rust-audit/issues/2) has a feature request for it. However, this does not require support from Cargo and can be implemented as a crate. +- Record and surface versions of C libraries statically linked into the Rust executable, e.g. OpenSSL. + +Think about what the natural extension and evolution of your proposal would +be and how it would affect the language and project as a whole in a holistic +way. Try to use this section as a tool to more fully consider all possible +interactions with the project and language in your proposal. +Also consider how the this all fits into the roadmap for the project +and of the relevant sub-team. + +This is also a good place to "dump ideas", if they are out of scope for the +RFC you are writing but otherwise related. + +If you have tried and cannot think of any future possibilities, +you may simply state that you cannot think of anything. + +Note that having something written down in the future-possibilities section +is not a reason to accept the current or a future RFC; such notes should be +in the section on motivation or rationale in this or subsequent RFCs. +The section merely provides additional information. From 6c32f8b71a3ef4248a5905b73f584de648eadef8 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 3 Nov 2019 18:10:22 +0100 Subject: [PATCH 02/22] Write more of the RFC --- text/0000-cargo-embed-dependency-versions.md | 89 +++++++++----------- 1 file changed, 38 insertions(+), 51 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index f547de15af4..e115acc56fb 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -1,7 +1,7 @@ - Feature Name: `cargo_embed_dependency_versions` - Start Date: (fill me in with today's date, YYYY-MM-DD) - RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) +- Rust Issue: None # Summary [summary]: #summary @@ -15,97 +15,84 @@ Rust is very promising for security-critical applications due to its safety guar Linux distributions alert you if you're running a vulnerable software version and you can opt in to automatic security updates. Cargo not only has no automatic update infrastructure, it doesn't even know which libraries or library versions went into compiling a certain binary, so there's no way to check if your system is vulnerable or not. - - -The primary motivation is cross-referencing versions of the dependencies against [RustSec advisory database](https://github.com/RustSec/advisory-db). This also enables use cases such as making a fix in a library crate and then ensuring it's been rolled out to your entire fleet, or preventing binaries with unvetted dependencies from reaching production. - -Why are we doing this? What use cases does it support? What is the expected outcome? +The primary use case for this information is cross-referencing versions of the dependencies against [RustSec advisory database](https://github.com/RustSec/advisory-db) and/or [Common Vulnerabilities and Exposures](https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures). This also enables use cases such as ensuring a fix in a depencency has been propagated across the entirety of your fleet or preventing binaries with unvetted dependencies from accidentally reaching a production environment - all with zero bookkeeping. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -Explain the proposal as if it was already included in the language and you were teaching it to another Rust programmer. That generally means: +Every time an executable is compiled with Cargo, the contents of Cargo.lock are embedded in the generated binary. It can be recovered using existing tools like `readelf` or Rust-specific tooling, and then inspected manually or processed in an automated way just like the regular `Cargo.lock` file. -- Introducing new named concepts. -- Explaining the feature largely in terms of examples. -- Explaining how Rust programmers should *think* about the feature, and how it should impact the way they use Rust. It should explain the impact as concretely as possible. -- If applicable, provide sample error messages, deprecation warnings, or migration guidance. -- If applicable, describe the differences between teaching this to existing Rust programmers and new Rust programmers. - -For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms. +WASM, asm.js and embedded platforms excempt from this mechanism since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -This is the technical portion of the RFC. Explain the design in sufficient detail that: +The version information is encoded in an additional arbitrary section of the executable (PE, ELF and Mach-O all allow arbitrary sections) by Cargo. Section name is subject to bikeshedding. + +For each crate in the dependency tree, including the root crate, the recorded version information contains the name, version, origin URL and checksum (equivalent to the current contents of `Cargo.lock` file). The exact format is TBD - see [unresolved questions](#unresolved-questions). + +A prototype implementation for Linux in `bash` looks like this: -- Its interaction with other features is clear. -- It is reasonably clear how the feature would be implemented. -- Corner cases are dissected by example. +```shell +# Insert Cargo.lock into a new '.dep-list' section +objcopy --add-section .dep-list=Cargo.lock --set-section-flags .dep-list=noload,readonly mybinary mybinary.withdeps -The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. +# Extract Cargo.lock +objcopy -O binary --set-section-flags .dep-list=alloc --only-section=.dep-list mybinary.withdeps Cargo.lock.extracted +``` # Drawbacks [drawbacks]: #drawbacks -- Adds more platform-specific code to the build process which needs to be maintained. - Slightly increases the size of the generated binaries. However, the increase is below 1%. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its Cargo.lock even with a couple of dependencies is less than 1Kb, that's under 1/1000 of the size of the binary. Since Cargo.lock grows linearly with the number of dependencies, it will keep being negligible. - -Why should we *not* do this? +- Adds more platform-specific code to the build process, which needs to be maintained. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives Rationale: -- Version information is impossible to misplace. As long as you have the binary, you can recover the info about dependency versions. The importance of this cannot be overstated. This allows auditing e.g. a Docker container that you did not build yourself, or a server that somebody's built a year ago and left no audit trail. -- A malicious actor could lie about the version information. However, doing so requires modifying the binary - and if a malicious actor can do _that,_ you are already pwned. So this does not create any additional attack vectors - other than exploiting the tool that's recovering the version information, which can be easily sandboxed. -- Any software supply chain verification that might be deployed automatically applies to the version information. There is no need to separately authenticate it. +- This way version information is *impossible* to misplace. As long as you have the binary, you can recover the info about dependency versions. The importance of this cannot be overstated. This allows auditing e.g. a Docker container that you did not build yourself, or a server that somebody's set up a year ago and left no audit trail. +- A malicious actor could lie about the version information. However, doing so requires modifying the binary - and if a malicious actor can do _that,_ you are pwned anyway. So this does not create any additional attack vectors other than exploiting the tool that's recovering the version information, which can be easily sandboxed. +- Any binary authentication that might be deployed automatically applies to the version information. There is no need to separately authenticate it. +- Tooling for extracting information from binaries (such as ELF sections) is already readily available. Tooling for parsing `Cargo.lock` also exists. - This enables third parties such as cloud providers to scan your binaries for you. Google Cloud [already provides such a service](https://cloud.google.com/container-registry/docs/get-image-vulnerabilities), Amazon has [an open-source project you can deploy](https://aws.amazon.com/blogs/publicsector/detect-vulnerabilities-in-the-docker-images-in-your-applications/) while Azure [integrates several partner solutions](https://docs.microsoft.com/en-us/azure/security-center/security-center-vulnerability-assessment-recommendations). Alternatives: -- Do nothing. Identifying vulnerable binaries will remain impossible. -- Track version information separately from the binaries, recording it when running `cargo install` and surfacing it through some other Cargo subcommand. When installing not though `cargo install`, rely on Linux package managers to track version information. Identifying vulnerable binaries will remain impossible on all other platforms, as well as on Linux for code compiled with `cargo build`. Verification by third parties remains impossible. - -- Why is this design the best in the space of possible designs? -- What other designs have been considered and what is the rationale for not choosing them? -- What is the impact of not doing this? +- Do nothing. + - Identifying vulnerable binaries will remain impossible. We will see increasing number of known vulnerabilities unpatched in production. +- Track version information separately from the binaries, recording it when running `cargo install` and surfacing it through some other Cargo subcommand. When installing not though `cargo install`, rely on Linux package managers to track version information. + - Identifying vulnerable binaries will remain impossible on all other platforms, as well as on Linux for code compiled with `cargo build`. + - Verification by third parties will remain impossible. +- Record version information in a `&'static str` in the binary instead if ELF sections, with start/stop markers to allow black-box extraction from the outside. + - This has been [prototyped](https://github.com/Shnatsel/rust-audit). It has the upside of allowing the binary itself to introspect its version info, but appears to be harder to implement and maintain. +- Provide a Cargo wrapper or plugin to implement this, but do not put it in Cargo itself. + - When people actually need this information (e.g. to check if they're impacted by a vulnerability) it is too late to reach for third-party tooling - the executables have already been built and deployed, and the information is already lost. As such, this mechanism is completely ineffective if it's not enabled by default. # Prior art [prior-art]: #prior-art -Discuss prior art, both the good and the bad, in relation to this proposal. -A few examples of what this can include are: - -- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? -- For community proposals: Is this done by some other community and what were their experiences with it? -- For other teams: What lessons can we learn from what other communities have done here? -- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. +`rustc` already embeds compiler and LLVM version in the executables built with it. You can see it by running `strings your_executable | grep 'rustc version'`. -This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. -If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. +The author is not aware of direct prior art in other languages. Since build system and package management system are usually decoupled, most languages did not have the opportunity to implement anything like this. -Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. -Please also take into consideration that rust sometimes intentionally diverges from common language features. +In microservice environments it is fairly typical to expose an HTTP endpoint returning the application version, see e.g. [example from Go cookbook](https://blog.kowalczyk.info/article/vEja/embedding-build-number-in-go-executable.html). However, this typically does not include versions of the dependencies. # Unresolved questions [unresolved-questions]: #unresolved-questions -- The format of Cargo.lock is not stabilized and is evolving. Should we encode Cargo.lock as-is and require tooling to track the updates, or commit to a stable subset of Cargo.lock? -- Should this also apply to shared libraries? -- Should this information be removed when stripping the binary of debug symbols? -- Are there any cases where you would _not_ want to allow whoever is running the binary to check it for vulnerabilities? - -Out of scope for now: - -- how to track and communicate versions of statically linked C libraries, such as OpenSSL? +1. The format of Cargo.lock is not stabilized and is evolving. Should we encode Cargo.lock as-is and require tooling to track the updates, commit to a stable subset of Cargo.lock or use something else altogether? +1. Should this also apply to shared libraries? +1. Should this information be removed when stripping the binary of debug symbols? +1. Are there any cases where you would _not_ want to allow whoever is running the binary to check it for known vulnerabilities? # Future possibilities [future-possibilities]: #future-possibilities - Surface dependency information through an HTTP endpoint in a microservice environment. The [proof-of-concept](https://github.com/Shnatsel/rust-audit/issues/2) has a feature request for it. However, this does not require support from Cargo and can be implemented as a crate. -- Record and surface versions of C libraries statically linked into the Rust executable, e.g. OpenSSL. + - Is data embedded in an ELF section accessible to the application itself at runtime? +- Record and surface versions of C libraries statically linked into the Rust executable, e.g. OpenSSL. Think about what the natural extension and evolution of your proposal would be and how it would affect the language and project as a whole in a holistic From dcd7cd71fd529b4b20f6042b25425311cf550935 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 3 Nov 2019 18:16:37 +0100 Subject: [PATCH 03/22] Finishing touches on the RFC before opening a PR --- text/0000-cargo-embed-dependency-versions.md | 30 ++++---------------- 1 file changed, 6 insertions(+), 24 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index e115acc56fb..dfdaaf6b397 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -20,16 +20,14 @@ The primary use case for this information is cross-referencing versions of the d # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -Every time an executable is compiled with Cargo, the contents of Cargo.lock are embedded in the generated binary. It can be recovered using existing tools like `readelf` or Rust-specific tooling, and then inspected manually or processed in an automated way just like the regular `Cargo.lock` file. +Every time an executable is compiled with Cargo, the contents of `Cargo.lock` are embedded in the generated binary. It can be recovered using existing tools like `readelf` or Rust-specific tooling, and then inspected manually or processed in an automated way just like the regular `Cargo.lock` file. WASM, asm.js and embedded platforms excempt from this mechanism since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -The version information is encoded in an additional arbitrary section of the executable (PE, ELF and Mach-O all allow arbitrary sections) by Cargo. Section name is subject to bikeshedding. - -For each crate in the dependency tree, including the root crate, the recorded version information contains the name, version, origin URL and checksum (equivalent to the current contents of `Cargo.lock` file). The exact format is TBD - see [unresolved questions](#unresolved-questions). +The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). Section name is subject to bikeshedding. A prototype implementation for Linux in `bash` looks like this: @@ -41,10 +39,12 @@ objcopy --add-section .dep-list=Cargo.lock --set-section-flags .dep-list=noload, objcopy -O binary --set-section-flags .dep-list=alloc --only-section=.dep-list mybinary.withdeps Cargo.lock.extracted ``` +For each crate in the dependency tree, including the root crate, the recorded version information contains the name, version, origin URL and checksum (equivalent to the current contents of `Cargo.lock` file). The exact format is TBD - see [unresolved questions](#unresolved-questions). + # Drawbacks [drawbacks]: #drawbacks -- Slightly increases the size of the generated binaries. However, the increase is below 1%. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its Cargo.lock even with a couple of dependencies is less than 1Kb, that's under 1/1000 of the size of the binary. Since Cargo.lock grows linearly with the number of dependencies, it will keep being negligible. +- Slightly increases the size of the generated binaries. However, the increase is typically below 1%. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its Cargo.lock even with a couple of dependencies is less than 1Kb, that's under 1/1000 of the size of the binary. Since Cargo.lock grows linearly with the number of dependencies, it will keep being negligible. - Adds more platform-specific code to the build process, which needs to be maintained. # Rationale and alternatives @@ -73,7 +73,7 @@ Alternatives: # Prior art [prior-art]: #prior-art -`rustc` already embeds compiler and LLVM version in the executables built with it. You can see it by running `strings your_executable | grep 'rustc version'`. +The Rust compiler already embeds compiler and LLVM version in the executables built with it. You can see it by running `strings your_executable | grep 'rustc version'`. The author is not aware of direct prior art in other languages. Since build system and package management system are usually decoupled, most languages did not have the opportunity to implement anything like this. @@ -93,21 +93,3 @@ In microservice environments it is fairly typical to expose an HTTP endpoint ret - Surface dependency information through an HTTP endpoint in a microservice environment. The [proof-of-concept](https://github.com/Shnatsel/rust-audit/issues/2) has a feature request for it. However, this does not require support from Cargo and can be implemented as a crate. - Is data embedded in an ELF section accessible to the application itself at runtime? - Record and surface versions of C libraries statically linked into the Rust executable, e.g. OpenSSL. - -Think about what the natural extension and evolution of your proposal would -be and how it would affect the language and project as a whole in a holistic -way. Try to use this section as a tool to more fully consider all possible -interactions with the project and language in your proposal. -Also consider how the this all fits into the roadmap for the project -and of the relevant sub-team. - -This is also a good place to "dump ideas", if they are out of scope for the -RFC you are writing but otherwise related. - -If you have tried and cannot think of any future possibilities, -you may simply state that you cannot think of anything. - -Note that having something written down in the future-possibilities section -is not a reason to accept the current or a future RFC; such notes should be -in the section on motivation or rationale in this or subsequent RFCs. -The section merely provides additional information. From f77698a9f3a238c8c4a219bcca070b6719b9da0e Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 3 Nov 2019 18:23:52 +0100 Subject: [PATCH 04/22] Fill in date and PR # --- text/0000-cargo-embed-dependency-versions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index dfdaaf6b397..738d32b8487 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -1,6 +1,6 @@ - Feature Name: `cargo_embed_dependency_versions` -- Start Date: (fill me in with today's date, YYYY-MM-DD) -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Start Date: 2019-11-03 +- RFC PR: [rust-lang/rfcs#2801](https://github.com/rust-lang/rfcs/pull/2801) - Rust Issue: None # Summary From 53d4fbd5a1feac853cca1bb8cee22bf48b69f82e Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 3 Nov 2019 18:51:57 +0100 Subject: [PATCH 05/22] Add prior art from Go. Thanks @alex! --- text/0000-cargo-embed-dependency-versions.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 738d32b8487..625926abf84 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -22,7 +22,7 @@ The primary use case for this information is cross-referencing versions of the d Every time an executable is compiled with Cargo, the contents of `Cargo.lock` are embedded in the generated binary. It can be recovered using existing tools like `readelf` or Rust-specific tooling, and then inspected manually or processed in an automated way just like the regular `Cargo.lock` file. -WASM, asm.js and embedded platforms excempt from this mechanism since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. +WASM, asm.js and embedded platforms are excempt from this mechanism since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -75,7 +75,9 @@ Alternatives: The Rust compiler already embeds compiler and LLVM version in the executables built with it. You can see it by running `strings your_executable | grep 'rustc version'`. -The author is not aware of direct prior art in other languages. Since build system and package management system are usually decoupled, most languages did not have the opportunity to implement anything like this. +Go compiler embeds `go.mod` dependency information into its compiled binaries. Unlike Rust, Go does not have a machine-readable vulnerability database yet, but this information is already used by e.g. [golicense](https://github.com/mitchellh/golicense). + +Since build system and package management system are usually decoupled, most other languages did not have the opportunity to implement anything like this. In microservice environments it is fairly typical to expose an HTTP endpoint returning the application version, see e.g. [example from Go cookbook](https://blog.kowalczyk.info/article/vEja/embedding-build-number-in-go-executable.html). However, this typically does not include versions of the dependencies. From e63c92560467a3ac781cb7c8f48c31a4153eb755 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 3 Nov 2019 22:34:07 +0100 Subject: [PATCH 06/22] Include prior art from Ruby --- text/0000-cargo-embed-dependency-versions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 625926abf84..96ceab20365 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -77,6 +77,8 @@ The Rust compiler already embeds compiler and LLVM version in the executables bu Go compiler embeds `go.mod` dependency information into its compiled binaries. Unlike Rust, Go does not have a machine-readable vulnerability database yet, but this information is already used by e.g. [golicense](https://github.com/mitchellh/golicense). +The most common way to manage Ruby apps involves `Gemfile.lock` which can be thought of as a runtime `Cargo.lock`. Some companies have automation searching for these files in production VMs/containers and cross-referencing them against [RubySec](https://rubysec.com/). + Since build system and package management system are usually decoupled, most other languages did not have the opportunity to implement anything like this. In microservice environments it is fairly typical to expose an HTTP endpoint returning the application version, see e.g. [example from Go cookbook](https://blog.kowalczyk.info/article/vEja/embedding-build-number-in-go-executable.html). However, this typically does not include versions of the dependencies. From 5aed819b8a9098fa82b3374038cc165b70b47911 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 3 Nov 2019 23:15:18 +0100 Subject: [PATCH 07/22] Link to more info on binary sizes --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 96ceab20365..b2ca91569ac 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -44,7 +44,7 @@ For each crate in the dependency tree, including the root crate, the recorded ve # Drawbacks [drawbacks]: #drawbacks -- Slightly increases the size of the generated binaries. However, the increase is typically below 1%. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its Cargo.lock even with a couple of dependencies is less than 1Kb, that's under 1/1000 of the size of the binary. Since Cargo.lock grows linearly with the number of dependencies, it will keep being negligible. +- Slightly increases the size of the generated binaries. However, the increase is typically below 1%. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its Cargo.lock even with a couple of dependencies is less than 1Kb, that's under 1/1000 of the size of the binary. Since Cargo.lock grows linearly with the number of dependencies, it will keep being negligible. This [seems to hold empirically](https://github.com/rust-lang/rfcs/pull/2801#issuecomment-549184251) too. - Adds more platform-specific code to the build process, which needs to be maintained. # Rationale and alternatives From 41bc1dd722873b2f9b6b2e981ff5d9f77b633b29 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Thu, 3 Aug 2023 21:07:21 +0200 Subject: [PATCH 08/22] Update based on lessons learned from `cargo auditable`, describe the exact format. --- text/0000-cargo-embed-dependency-versions.md | 141 +++++++++++++++---- 1 file changed, 113 insertions(+), 28 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index b2ca91569ac..9597de87df7 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -6,7 +6,7 @@ # Summary [summary]: #summary -Embed information equivalent to the contents of Cargo.lock into compiled binaries so it could be programmatically recovered later. +Embed the crate dependency tree in a machine-readable format into compiled binaries so it could be programmatically recovered later. # Motivation [motivation]: #motivation @@ -15,36 +15,121 @@ Rust is very promising for security-critical applications due to its safety guar Linux distributions alert you if you're running a vulnerable software version and you can opt in to automatic security updates. Cargo not only has no automatic update infrastructure, it doesn't even know which libraries or library versions went into compiling a certain binary, so there's no way to check if your system is vulnerable or not. -The primary use case for this information is cross-referencing versions of the dependencies against [RustSec advisory database](https://github.com/RustSec/advisory-db) and/or [Common Vulnerabilities and Exposures](https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures). This also enables use cases such as ensuring a fix in a depencency has been propagated across the entirety of your fleet or preventing binaries with unvetted dependencies from accidentally reaching a production environment - all with zero bookkeeping. +The primary use case for this information is cross-referencing versions of the dependencies against [RustSec advisory database](https://github.com/RustSec/advisory-db) and/or third-party databases such as [Common Vulnerabilities and Exposures](https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures). This also enables use cases such as ensuring a fix in a depencency has been propagated across the entirety of your fleet or preventing binaries with unvetted dependencies from accidentally reaching a production environment - all with zero bookkeeping. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -Every time an executable is compiled with Cargo, the contents of `Cargo.lock` are embedded in the generated binary. It can be recovered using existing tools like `readelf` or Rust-specific tooling, and then inspected manually or processed in an automated way just like the regular `Cargo.lock` file. +Every time an executable is compiled with Cargo, the dependency tree of the executable is recorded in the binary. This includes the names, versions, dependency kind (build or runtime), and origin (crates.io, git, local filesystem, custom registry). Development dependencies are not recorded, since they cannot affect the final binary. All filesystem paths and URLs are redacted to preserve privacy. The data is encoded in JSON and compressed with zlib to reduce its size. -WASM, asm.js and embedded platforms are excempt from this mechanism since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. +This data can be recovered using existing tools like `readelf` or Rust-specific tooling. It can be then used to create a Software Bill of Materials in a common format, or audit the dependency list for known vulnerabilities. -# Reference-level explanation -[reference-level-explanation]: #reference-level-explanation +WASM, asm.js and embedded platforms are excempt from this mechanism by default since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. -The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). Section name is subject to bikeshedding. +A configuration option can be used to opt out of this behavior if it is not desired (e.g. when building [extremely minimal binaries](https://github.com/johnthagen/min-sized-rust)). -A prototype implementation for Linux in `bash` looks like this: - -```shell -# Insert Cargo.lock into a new '.dep-list' section -objcopy --add-section .dep-list=Cargo.lock --set-section-flags .dep-list=noload,readonly mybinary mybinary.withdeps +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation -# Extract Cargo.lock -objcopy -O binary --set-section-flags .dep-list=alloc --only-section=.dep-list mybinary.withdeps Cargo.lock.extracted +The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). The section name is subject to bikeshedding. + +The data is encoded in JSON which is compressed with Zlib. All arrays a sorted not to disrupt reproducible builds. + +The JSON schema specifying the format is provided below. If you find Rust structures more readable, you can find them [here](https://github.com/rust-secure-code/cargo-auditable/blob/311f9932128667b8b18113becdea276b3d98aace/auditable-serde/src/lib.rs#L99-L172). In case of divergences the JSON schema provided in this RFC takes precedence. + +```json +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "type": "object", + "required": [ + "packages" + ], + "properties": { + "packages": { + "type": "array", + "items": { + "$ref": "#/definitions/Package" + } + } + }, + "definitions": { + "DependencyKind": { + "type": "string", + "enum": [ + "build", + "runtime" + ] + }, + "Package": { + "description": "A single package in the dependency tree", + "type": "object", + "required": [ + "name", + "source", + "version" + ], + "properties": { + "dependencies": { + "description": "Packages are stored in an ordered array both in the `VersionInfo` struct and in JSON. Here we refer to each package by its index in the array. May be omitted if the list is empty.", + "type": "array", + "items": { + "type": "integer", + "format": "uint", + "minimum": 0.0 + } + }, + "kind": { + "description": "\"build\" or \"runtime\". May be omitted if set to \"runtime\". If it's both a build and a runtime dependency, \"runtime\" is recorded.", + "allOf": [ + { + "$ref": "#/definitions/DependencyKind" + } + ] + }, + "name": { + "description": "Crate name specified in the `name` field in Cargo.toml file. Examples: \"libc\", \"rand\"", + "type": "string" + }, + "root": { + "description": "Whether this is the root package in the dependency tree. There should only be one root package. May be omitted if set to `false`.", + "type": "boolean" + }, + "source": { + "description": "Currently \"git\", \"local\", \"crates.io\" or \"registry\". May be extended in the future with other revision control systems, etc.", + "allOf": [ + { + "$ref": "#/definitions/Source" + } + ] + }, + "version": { + "description": "The package's version in the [semantic version](https://semver.org) format.", + "type": "string" + } + } + }, + "Source": { + "description": "Serializes to \"git\", \"local\", \"crates.io\" or \"registry\". May be extended in the future with other revision control systems, etc.", + "oneOf": [ + { + "type": "string", + "enum": [ + "crates.io", + "git", + "local", + "registry" + ] + }, + ] + } + } +} ``` -For each crate in the dependency tree, including the root crate, the recorded version information contains the name, version, origin URL and checksum (equivalent to the current contents of `Cargo.lock` file). The exact format is TBD - see [unresolved questions](#unresolved-questions). - # Drawbacks [drawbacks]: #drawbacks -- Slightly increases the size of the generated binaries. However, the increase is typically below 1%. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its Cargo.lock even with a couple of dependencies is less than 1Kb, that's under 1/1000 of the size of the binary. Since Cargo.lock grows linearly with the number of dependencies, it will keep being negligible. This [seems to hold empirically](https://github.com/rust-lang/rfcs/pull/2801#issuecomment-549184251) too. +- Slightly increases the size of the generated binaries. However, the increase is [typically below 1%](https://github.com/rust-lang/rfcs/pull/2801#issuecomment-549184251). - Adds more platform-specific code to the build process, which needs to be maintained. # Rationale and alternatives @@ -52,11 +137,11 @@ For each crate in the dependency tree, including the root crate, the recorded ve Rationale: -- This way version information is *impossible* to misplace. As long as you have the binary, you can recover the info about dependency versions. The importance of this cannot be overstated. This allows auditing e.g. a Docker container that you did not build yourself, or a server that somebody's set up a year ago and left no audit trail. +- This way version information is *impossible* to misplace. As long as you have the binary, you can recover the info about dependency versions. The importance of this is impossible to overstate. This allows auditing e.g. a Docker container that you did not build yourself, or a server that somebody's set up a year ago and left no audit trail. - A malicious actor could lie about the version information. However, doing so requires modifying the binary - and if a malicious actor can do _that,_ you are pwned anyway. So this does not create any additional attack vectors other than exploiting the tool that's recovering the version information, which can be easily sandboxed. - Any binary authentication that might be deployed automatically applies to the version information. There is no need to separately authenticate it. -- Tooling for extracting information from binaries (such as ELF sections) is already readily available. Tooling for parsing `Cargo.lock` also exists. -- This enables third parties such as cloud providers to scan your binaries for you. Google Cloud [already provides such a service](https://cloud.google.com/container-registry/docs/get-image-vulnerabilities), Amazon has [an open-source project you can deploy](https://aws.amazon.com/blogs/publicsector/detect-vulnerabilities-in-the-docker-images-in-your-applications/) while Azure [integrates several partner solutions](https://docs.microsoft.com/en-us/azure/security-center/security-center-vulnerability-assessment-recommendations). +- Tooling for extracting information from binaries (such as ELF sections) is already readily available, as are zlib decompressors and JSON parsers. It can be extracted and parsed [in 5 lines of Python](https://github.com/rust-secure-code/cargo-auditable/blob/master/PARSING.md), or even with a shell one-liner in a pinch. +- This enables third parties such as cloud providers to scan your binaries for you. Google Cloud [already provides such a service](https://cloud.google.com/container-registry/docs/get-image-vulnerabilities), Amazon has [an open-source project you can deploy](https://aws.amazon.com/blogs/publicsector/detect-vulnerabilities-in-the-docker-images-in-your-applications/) while Azure [integrates several partner solutions](https://docs.microsoft.com/en-us/azure/security-center/security-center-vulnerability-assessment-recommendations). They do not support this specific format yet, but integration into Trivy was very easy, so adding support will likely be trivial. Alternatives: @@ -66,16 +151,18 @@ Alternatives: - Identifying vulnerable binaries will remain impossible on all other platforms, as well as on Linux for code compiled with `cargo build`. - Verification by third parties will remain impossible. - Record version information in a `&'static str` in the binary instead if ELF sections, with start/stop markers to allow black-box extraction from the outside. - - This has been [prototyped](https://github.com/Shnatsel/rust-audit). It has the upside of allowing the binary itself to introspect its version info, but appears to be harder to implement and maintain. + - This has been [prototyped](https://github.com/Shnatsel/rust-audit). It has the upside of allowing the binary itself to introspect its version info with little parsing, but the extraction is less efficient, and this is harder to implement and maintain. - Provide a Cargo wrapper or plugin to implement this, but do not put it in Cargo itself. - - When people actually need this information (e.g. to check if they're impacted by a vulnerability) it is too late to reach for third-party tooling - the executables have already been built and deployed, and the information is already lost. As such, this mechanism is completely ineffective if it's not enabled by default. + - When people actually need this information (e.g. to check if they're impacted by a vulnerability) it is too late to reach for third-party tooling - the executables have already been built and deployed, and the information is already lost. As such, this mechanism is ineffective if it's not enabled by default. # Prior art [prior-art]: #prior-art -The Rust compiler already embeds compiler and LLVM version in the executables built with it. You can see it by running `strings your_executable | grep 'rustc version'`. +An out-of-tree implementation of this RFC exists, see [`cargo auditable`](https://github.com/rust-secure-code/cargo-auditable/), and has garnered considerable interest. NixOS and Void Linux build all their Rust packages with it today; it is also used in production at Miscrosoft. Extracting the embedded data is already supported by [`rust-audit-info`](https://crates.io/crates/rust-audit-info) and [Syft](https://github.com/anchore/syft). Auditing such binaries for known vulnerabilities is already supported by [`cargo audit`](https://crates.io/crates/cargo-audit) and [Trivy](https://github.com/aquasecurity/trivy). + +The Rust compiler already [embeds](https://github.com/rust-lang/rust/pull/97550) compiler and LLVM version in the executables built with it. -Go compiler embeds `go.mod` dependency information into its compiled binaries. Unlike Rust, Go does not have a machine-readable vulnerability database yet, but this information is already used by e.g. [golicense](https://github.com/mitchellh/golicense). +Go compiler embeds `go.mod` dependency information into its compiled binaries. Due to Go binaries generally being far larger than Rust binaries, the binary size is not a constraint, so they embed much more information - e.g. the licence for each package in the dependency tree, which is then read by the [golicense](https://github.com/mitchellh/golicense) tool. The most common way to manage Ruby apps involves `Gemfile.lock` which can be thought of as a runtime `Cargo.lock`. Some companies have automation searching for these files in production VMs/containers and cross-referencing them against [RubySec](https://rubysec.com/). @@ -86,10 +173,7 @@ In microservice environments it is fairly typical to expose an HTTP endpoint ret # Unresolved questions [unresolved-questions]: #unresolved-questions -1. The format of Cargo.lock is not stabilized and is evolving. Should we encode Cargo.lock as-is and require tooling to track the updates, commit to a stable subset of Cargo.lock or use something else altogether? -1. Should this also apply to shared libraries? -1. Should this information be removed when stripping the binary of debug symbols? -1. Are there any cases where you would _not_ want to allow whoever is running the binary to check it for known vulnerabilities? +- How exactly this should be enabled or disabled for a given target? Should there be a flag in the target configuration file, or some other mechanism? # Future possibilities [future-possibilities]: #future-possibilities @@ -97,3 +181,4 @@ In microservice environments it is fairly typical to expose an HTTP endpoint ret - Surface dependency information through an HTTP endpoint in a microservice environment. The [proof-of-concept](https://github.com/Shnatsel/rust-audit/issues/2) has a feature request for it. However, this does not require support from Cargo and can be implemented as a crate. - Is data embedded in an ELF section accessible to the application itself at runtime? - Record and surface versions of C libraries statically linked into the Rust executable, e.g. OpenSSL. +- Include additional information, e.g. Git revision for dependencies sourced from Git repositories. This is not part of the original RFC because new fields can be added in a backwards-compatible way. From a4767b743213827e15ce49eaf2b2056a48e7ac49 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Thu, 3 Aug 2023 23:29:23 +0200 Subject: [PATCH 09/22] Address some of the feedback by Ed Page --- text/0000-cargo-embed-dependency-versions.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 9597de87df7..9b03ce1b5fe 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -20,7 +20,7 @@ The primary use case for this information is cross-referencing versions of the d # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -Every time an executable is compiled with Cargo, the dependency tree of the executable is recorded in the binary. This includes the names, versions, dependency kind (build or runtime), and origin (crates.io, git, local filesystem, custom registry). Development dependencies are not recorded, since they cannot affect the final binary. All filesystem paths and URLs are redacted to preserve privacy. The data is encoded in JSON and compressed with zlib to reduce its size. +Every time an executable is compiled with Cargo, the dependency tree of the executable is recorded in the binary. This includes the names, versions, dependency kind (build or runtime), and origin kind (crates.io, git, local filesystem, custom registry). Development dependencies are not recorded, since they cannot affect the final binary. All filesystem paths and URLs are redacted to preserve privacy. The data is encoded in JSON and compressed with zlib to reduce its size. This data can be recovered using existing tools like `readelf` or Rust-specific tooling. It can be then used to create a Software Bill of Materials in a common format, or audit the dependency list for known vulnerabilities. @@ -152,7 +152,12 @@ Alternatives: - Verification by third parties will remain impossible. - Record version information in a `&'static str` in the binary instead if ELF sections, with start/stop markers to allow black-box extraction from the outside. - This has been [prototyped](https://github.com/Shnatsel/rust-audit). It has the upside of allowing the binary itself to introspect its version info with little parsing, but the extraction is less efficient, and this is harder to implement and maintain. +- Record version information in an industry standard SBOM format instead of a custom format. + - This has been prototyped, and we've found the existing formats unsuitable. The primary reasons are a significant binary size increase (the existing formats are quite verbose, not designed for this use case) and issues with reproducible builds (they require timestamps). + - "SPDX in Zlib in a linker section" is not really an industry-standard format. Adding support for the custom format to [Syft](https://github.com/anchore/syft) was trivial, since it's nearly isomorphic to other SBOM formats, so the custom JSON encoding does not seem to add a lot of overhead to consuming this data. + - For compatibility with systems that cannot consume this data directly, external tools can be used to convert to industry standard SBOMs. [Syft](https://github.com/anchore/syft) can already do this today. - Provide a Cargo wrapper or plugin to implement this, but do not put it in Cargo itself. + - Third-party implementations cannot be perfectly reliable because Cargo does not expose sufficient information for a perfectly robust system. For example, custom target specifications are impossible to support. There are also [other corner cases](https://github.com/rust-secure-code/cargo-auditable/issues/124) that appear to be impossible to resolve based on the information from `cargo metadata` alone. - When people actually need this information (e.g. to check if they're impacted by a vulnerability) it is too late to reach for third-party tooling - the executables have already been built and deployed, and the information is already lost. As such, this mechanism is ineffective if it's not enabled by default. # Prior art @@ -174,6 +179,8 @@ In microservice environments it is fairly typical to expose an HTTP endpoint ret [unresolved-questions]: #unresolved-questions - How exactly this should be enabled or disabled for a given target? Should there be a flag in the target configuration file, or some other mechanism? +- How exactly should opt-in or opt-out from embedding this data be toggled? Should it be per-profile, like `strip = true`, or a global configuration option? +- How exactly the initial roll-out should be handled? Following the sparse index example (opt-in on nightly -> default on nightly -> opt-in on stable -> default on stable) sounds like a good idea, but sparse index is target-independent, while this feature is not. So it makes sense to enable it for Tier 1 targets first, and have it gradually expanded to Tier 2, like it was done for LLVM coverage profiling. Does it make sense to have a "stable but opt-in" period in this case? # Future possibilities [future-possibilities]: #future-possibilities From 14c832b4d7a66eac5464d181539b0acde8920502 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Thu, 3 Aug 2023 23:04:14 +0000 Subject: [PATCH 10/22] Specify section name currently in use --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 9b03ce1b5fe..927649121bb 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -31,7 +31,7 @@ A configuration option can be used to opt out of this behavior if it is not desi # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). The section name is subject to bikeshedding. +The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). The section name is `.dep-v0` across all platforms. The section name must be changed if breaking changes are made to the format. The data is encoded in JSON which is compressed with Zlib. All arrays a sorted not to disrupt reproducible builds. From 9b1122a527541fd10c56237d7c4f2be71589d759 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Thu, 3 Aug 2023 23:21:01 +0000 Subject: [PATCH 11/22] fix typo Co-authored-by: Daniel Paoliello --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 927649121bb..14d4a62824f 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -24,7 +24,7 @@ Every time an executable is compiled with Cargo, the dependency tree of the exec This data can be recovered using existing tools like `readelf` or Rust-specific tooling. It can be then used to create a Software Bill of Materials in a common format, or audit the dependency list for known vulnerabilities. -WASM, asm.js and embedded platforms are excempt from this mechanism by default since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. +WASM, asm.js and embedded platforms are exempt from this mechanism by default since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. A configuration option can be used to opt out of this behavior if it is not desired (e.g. when building [extremely minimal binaries](https://github.com/johnthagen/min-sized-rust)). From 2f33efb65d8fcf77a6ebe396b335e9763b35d5f8 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Thu, 3 Aug 2023 23:21:16 +0000 Subject: [PATCH 12/22] fix typo Co-authored-by: Daniel Paoliello --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 14d4a62824f..fe66b919f71 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -163,7 +163,7 @@ Alternatives: # Prior art [prior-art]: #prior-art -An out-of-tree implementation of this RFC exists, see [`cargo auditable`](https://github.com/rust-secure-code/cargo-auditable/), and has garnered considerable interest. NixOS and Void Linux build all their Rust packages with it today; it is also used in production at Miscrosoft. Extracting the embedded data is already supported by [`rust-audit-info`](https://crates.io/crates/rust-audit-info) and [Syft](https://github.com/anchore/syft). Auditing such binaries for known vulnerabilities is already supported by [`cargo audit`](https://crates.io/crates/cargo-audit) and [Trivy](https://github.com/aquasecurity/trivy). +An out-of-tree implementation of this RFC exists, see [`cargo auditable`](https://github.com/rust-secure-code/cargo-auditable/), and has garnered considerable interest. NixOS and Void Linux build all their Rust packages with it today; it is also used in production at Microsoft. Extracting the embedded data is already supported by [`rust-audit-info`](https://crates.io/crates/rust-audit-info) and [Syft](https://github.com/anchore/syft). Auditing such binaries for known vulnerabilities is already supported by [`cargo audit`](https://crates.io/crates/cargo-audit) and [Trivy](https://github.com/aquasecurity/trivy). The Rust compiler already [embeds](https://github.com/rust-lang/rust/pull/97550) compiler and LLVM version in the executables built with it. From e85d77ce0065d0b476b8724d0ce3c6b13c5325d6 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Fri, 4 Aug 2023 01:49:26 +0200 Subject: [PATCH 13/22] Add a note on debug symbols in alternatives --- text/0000-cargo-embed-dependency-versions.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index fe66b919f71..4d5a52cd9e9 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -156,6 +156,9 @@ Alternatives: - This has been prototyped, and we've found the existing formats unsuitable. The primary reasons are a significant binary size increase (the existing formats are quite verbose, not designed for this use case) and issues with reproducible builds (they require timestamps). - "SPDX in Zlib in a linker section" is not really an industry-standard format. Adding support for the custom format to [Syft](https://github.com/anchore/syft) was trivial, since it's nearly isomorphic to other SBOM formats, so the custom JSON encoding does not seem to add a lot of overhead to consuming this data. - For compatibility with systems that cannot consume this data directly, external tools can be used to convert to industry standard SBOMs. [Syft](https://github.com/anchore/syft) can already do this today. +- Record version information in debug symbols instead of binary sections. + - Debug information formats are highly platform-specific, complex, and poorly documented. For example, Microsoft provides no documentation for Windows PDB. Extracting it would be considerably more difficult. Parsing debug information would be a major source of complexity and bugs. + - Some Linux distributions, such as Debian, ship debug symbols separately from the binaries, and do not install the debug symbols by default. We need this information to be included in the binaries, not the debug symbols. - Provide a Cargo wrapper or plugin to implement this, but do not put it in Cargo itself. - Third-party implementations cannot be perfectly reliable because Cargo does not expose sufficient information for a perfectly robust system. For example, custom target specifications are impossible to support. There are also [other corner cases](https://github.com/rust-secure-code/cargo-auditable/issues/124) that appear to be impossible to resolve based on the information from `cargo metadata` alone. - When people actually need this information (e.g. to check if they're impacted by a vulnerability) it is too late to reach for third-party tooling - the executables have already been built and deployed, and the information is already lost. As such, this mechanism is ineffective if it's not enabled by default. From 3c49d7277c6dfd7ce7397a6b81c0373778582239 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Fri, 4 Aug 2023 11:18:24 +0200 Subject: [PATCH 14/22] Cover introspection in future possibilities --- text/0000-cargo-embed-dependency-versions.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 4d5a52cd9e9..c312784e3bd 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -188,7 +188,8 @@ In microservice environments it is fairly typical to expose an HTTP endpoint ret # Future possibilities [future-possibilities]: #future-possibilities -- Surface dependency information through an HTTP endpoint in a microservice environment. The [proof-of-concept](https://github.com/Shnatsel/rust-audit/issues/2) has a feature request for it. However, this does not require support from Cargo and can be implemented as a crate. - - Is data embedded in an ELF section accessible to the application itself at runtime? +- Let the binary itself access this data at runtime. + - This can be achieved today by running the extraction pipeline on `std::env::current_exe`, but that requires a minimal binary format parser, and access to `/proc` on Unix. + - The linker section is already given a symbol in the out-of-tree implementation, named `_AUDITABLE_VERSION_INFO`. It is possible to refer to it and access it. This has downsides such as confusing linker errors when embedding the audit data is disabled, and is out of scope of this initial RFC. - Record and surface versions of C libraries statically linked into the Rust executable, e.g. OpenSSL. - Include additional information, e.g. Git revision for dependencies sourced from Git repositories. This is not part of the original RFC because new fields can be added in a backwards-compatible way. From caef7576ebc820b9888bf11d48b61c07ea086217 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 6 Aug 2023 17:57:18 +0200 Subject: [PATCH 15/22] Specify how to configure this feature --- text/0000-cargo-embed-dependency-versions.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index c312784e3bd..a8fc54b3201 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -26,7 +26,7 @@ This data can be recovered using existing tools like `readelf` or Rust-specific WASM, asm.js and embedded platforms are exempt from this mechanism by default since they have very strict code size requirements. For those platforms we encourage you to use tooling that record the hash of every executable in a database and associates the hash with its Cargo.lock, compiler and LLVM version used for the build. -A configuration option can be used to opt out of this behavior if it is not desired (e.g. when building [extremely minimal binaries](https://github.com/johnthagen/min-sized-rust)). +A per-profile configuration option in `Cargo.toml` can be used to opt out of this behavior if it is not desired (e.g. when building [extremely minimal binaries](https://github.com/johnthagen/min-sized-rust)). The exact name of this option is subject to bikeshedding. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -126,11 +126,15 @@ The JSON schema specifying the format is provided below. If you find Rust struct } ``` +Not all compilations targets will embed this data. Some may support it but disable it by default (e.g. WebAssembly) while others may not support it at all. Whether the target support it, and whether embedding this data is enabled by default for a given target is recorded in the [target specification JSON](https://doc.rust-lang.org/rustc/targets/custom.html). The exact name of the configuration option is subject to bikeshedding. + # Drawbacks [drawbacks]: #drawbacks - Slightly increases the size of the generated binaries. However, the increase is [typically below 1%](https://github.com/rust-lang/rfcs/pull/2801#issuecomment-549184251). - Adds more platform-specific code to the build process, which needs to be maintained. +- Slightly more work need to be performed at compile time. This implies slightly slower compilation. + - If the compilation time impact is deemed to be significant, collecting and embedding this data will be disabled by default in debug profile before stabilization. It will be possible to override this default using the per-profile configuration option. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -181,8 +185,6 @@ In microservice environments it is fairly typical to expose an HTTP endpoint ret # Unresolved questions [unresolved-questions]: #unresolved-questions -- How exactly this should be enabled or disabled for a given target? Should there be a flag in the target configuration file, or some other mechanism? -- How exactly should opt-in or opt-out from embedding this data be toggled? Should it be per-profile, like `strip = true`, or a global configuration option? - How exactly the initial roll-out should be handled? Following the sparse index example (opt-in on nightly -> default on nightly -> opt-in on stable -> default on stable) sounds like a good idea, but sparse index is target-independent, while this feature is not. So it makes sense to enable it for Tier 1 targets first, and have it gradually expanded to Tier 2, like it was done for LLVM coverage profiling. Does it make sense to have a "stable but opt-in" period in this case? # Future possibilities From 8607f4b98e8565baf2b66695f7e874787d6c3c39 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sun, 6 Aug 2023 18:07:36 +0200 Subject: [PATCH 16/22] Add a section name bikeshedding caveat --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index a8fc54b3201..091b30e45ad 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -31,7 +31,7 @@ A per-profile configuration option in `Cargo.toml` can be used to opt out of thi # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). The section name is `.dep-v0` across all platforms. The section name must be changed if breaking changes are made to the format. +The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). The section name is `.dep-v0` across all platforms (subject to bikeshedding, but [within 8 bytes](https://github.com/rust-lang/rust/blob/4f7bb9890c0402cd145556ac1929d13d7524959e/compiler/rustc_codegen_ssa/src/back/metadata.rs#L462-L475)). The section name must be changed if breaking changes are made to the format. The data is encoded in JSON which is compressed with Zlib. All arrays a sorted not to disrupt reproducible builds. From d624a74dadbbdfb07ece3b8eb34bb8af10253d8a Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Mon, 7 Aug 2023 13:00:00 +0200 Subject: [PATCH 17/22] Rename 'runtime' dependencies to 'normal' for consistency with Cargo help text. These strings don't actually appear in the format itself because they are the default case and get omitted to save space, but sure, why not. --- text/0000-cargo-embed-dependency-versions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 091b30e45ad..ee567cc32bc 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -20,7 +20,7 @@ The primary use case for this information is cross-referencing versions of the d # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -Every time an executable is compiled with Cargo, the dependency tree of the executable is recorded in the binary. This includes the names, versions, dependency kind (build or runtime), and origin kind (crates.io, git, local filesystem, custom registry). Development dependencies are not recorded, since they cannot affect the final binary. All filesystem paths and URLs are redacted to preserve privacy. The data is encoded in JSON and compressed with zlib to reduce its size. +Every time an executable is compiled with Cargo, the dependency tree of the executable is recorded in the binary. This includes the names, versions, dependency kind (build or normal), and origin kind (crates.io, git, local filesystem, custom registry). Development dependencies are not recorded, since they cannot affect the final binary. All filesystem paths and URLs are redacted to preserve privacy. The data is encoded in JSON and compressed with zlib to reduce its size. This data can be recovered using existing tools like `readelf` or Rust-specific tooling. It can be then used to create a Software Bill of Materials in a common format, or audit the dependency list for known vulnerabilities. @@ -57,7 +57,7 @@ The JSON schema specifying the format is provided below. If you find Rust struct "type": "string", "enum": [ "build", - "runtime" + "normal" ] }, "Package": { @@ -79,7 +79,7 @@ The JSON schema specifying the format is provided below. If you find Rust struct } }, "kind": { - "description": "\"build\" or \"runtime\". May be omitted if set to \"runtime\". If it's both a build and a runtime dependency, \"runtime\" is recorded.", + "description": "\"build\" or \"normal\". May be omitted if set to \"normal\". If it's both a build and a normal dependency, \"normal\" is recorded.", "allOf": [ { "$ref": "#/definitions/DependencyKind" From 13ab32121ffc3e6ebed1f4cc45298f40806fc11b Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Fri, 11 Aug 2023 18:15:02 +0000 Subject: [PATCH 18/22] Apply suggestion Co-authored-by: Josh Triplett --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index ee567cc32bc..622e1a4adac 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -33,7 +33,7 @@ A per-profile configuration option in `Cargo.toml` can be used to opt out of thi The version information is encoded in an additional arbitrary section of the executable by Cargo. The exact mechanism varies depending on the executable format (ELF, Mach-O, PE, etc.). The section name is `.dep-v0` across all platforms (subject to bikeshedding, but [within 8 bytes](https://github.com/rust-lang/rust/blob/4f7bb9890c0402cd145556ac1929d13d7524959e/compiler/rustc_codegen_ssa/src/back/metadata.rs#L462-L475)). The section name must be changed if breaking changes are made to the format. -The data is encoded in JSON which is compressed with Zlib. All arrays a sorted not to disrupt reproducible builds. +The data is encoded in JSON which is compressed with Zlib. All arrays are sorted to not disrupt reproducible builds. The JSON schema specifying the format is provided below. If you find Rust structures more readable, you can find them [here](https://github.com/rust-secure-code/cargo-auditable/blob/311f9932128667b8b18113becdea276b3d98aace/auditable-serde/src/lib.rs#L99-L172). In case of divergences the JSON schema provided in this RFC takes precedence. From 0c832f3b65a03ff0d712f3a212121d80d082bc95 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Fri, 11 Aug 2023 18:16:49 +0000 Subject: [PATCH 19/22] Apply suggestion Co-authored-by: Josh Triplett --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 622e1a4adac..fa886a89ec1 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -126,7 +126,7 @@ The JSON schema specifying the format is provided below. If you find Rust struct } ``` -Not all compilations targets will embed this data. Some may support it but disable it by default (e.g. WebAssembly) while others may not support it at all. Whether the target support it, and whether embedding this data is enabled by default for a given target is recorded in the [target specification JSON](https://doc.rust-lang.org/rustc/targets/custom.html). The exact name of the configuration option is subject to bikeshedding. +Not all compilations targets support embedding this data. Whether the target supports it is recorded in the [target specification JSON](https://doc.rust-lang.org/rustc/targets/custom.html). The exact name of the configuration option is subject to bikeshedding. # Drawbacks [drawbacks]: #drawbacks From 8a08b7ccc0470b914d17d2742751facb842020c6 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Fri, 11 Aug 2023 23:12:45 +0200 Subject: [PATCH 20/22] Describe the prior art from other languages in more detail --- text/0000-cargo-embed-dependency-versions.md | 41 +++++++++++++++++--- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index fa886a89ec1..9b6ca3439d7 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -170,17 +170,48 @@ Alternatives: # Prior art [prior-art]: #prior-art -An out-of-tree implementation of this RFC exists, see [`cargo auditable`](https://github.com/rust-secure-code/cargo-auditable/), and has garnered considerable interest. NixOS and Void Linux build all their Rust packages with it today; it is also used in production at Microsoft. Extracting the embedded data is already supported by [`rust-audit-info`](https://crates.io/crates/rust-audit-info) and [Syft](https://github.com/anchore/syft). Auditing such binaries for known vulnerabilities is already supported by [`cargo audit`](https://crates.io/crates/cargo-audit) and [Trivy](https://github.com/aquasecurity/trivy). +## In Rust The Rust compiler already [embeds](https://github.com/rust-lang/rust/pull/97550) compiler and LLVM version in the executables built with it. -Go compiler embeds `go.mod` dependency information into its compiled binaries. Due to Go binaries generally being far larger than Rust binaries, the binary size is not a constraint, so they embed much more information - e.g. the licence for each package in the dependency tree, which is then read by the [golicense](https://github.com/mitchellh/golicense) tool. +An out-of-tree implementation of this RFC exists, see [`cargo auditable`](https://github.com/rust-secure-code/cargo-auditable/), and has garnered considerable interest. NixOS and Void Linux build all their Rust packages with it today; it is also used in production at Microsoft. Extracting the embedded data is already supported by [`rust-audit-info`](https://crates.io/crates/rust-audit-info) and [Syft](https://github.com/anchore/syft). Auditing such binaries for known vulnerabilities is already supported by [`cargo audit`](https://crates.io/crates/cargo-audit) and [Trivy](https://github.com/aquasecurity/trivy). + +## In other languages + +Suppose a vulnerability that is exploitable over the network is discovered in a popular networking library. We will look at several languages and see how each of them handles it. + +### C + +Linux distrubutions maintain a strict policy of using shared libraries (dynamic linking) and only having a single copy of the library in the system. The library is also tracked in the package manager. + +When the vulnerability is disclosed, an update to the single copy of the library in the system is issued by the distribution. To check if a given system is affected, you only need to look at the version of the distribution package. Tools to scan systems for vulnerable package versions are also available, including as a service. + +To mitigate the issue you simply need to run the regular security update command provided by your distribution and reboot. The single copy of the library is replaced and the system is secured. + +Sadly this does not apply to packages installed from outside the distribution, or software that is not packaged at all. + +### Ruby + +Ruby code typically has a `Gemfile.lock` which is analogous to `Cargo.lock`. Crucially, Ruby is never compiled into a binary, and this file is present at runtime in the deployed code. So you can tell exactly what versions of which libraries the code you run actually uses. + +Thanks to `Gemfile.lock` the code can be automatically checked for vulnerable package versions against databases such as [RubySec](https://rubysec.com/). + +Unlike in the Linux distribution scenario, the dependencies of every program have to be updated individually. + +### Go + +Go statically links all its code, so the C approach of updating a single instance of a shared library is impossible. Every single Go binary contains a copy of the vulnerable code, and all of them have to be rebuilt. + +To make this problem tractable, Go embeds detailed information about the dependency tree in each compiled binary. It is similar to `Cargo.lock` but much more detailed, also including the licenses and other metadata about the dependencies. + +Automated tools exist to detect vulnerable dependencies in Go binaries via a database of vulnerable versions. Since this is enabled by default for all builds, it does not matter how the binary was installed (distribution package, binary from Github releases, built from source, etc) - all vulnerable binaries can be identified. + +### Rust (prior to this RFC) -The most common way to manage Ruby apps involves `Gemfile.lock` which can be thought of as a runtime `Cargo.lock`. Some companies have automation searching for these files in production VMs/containers and cross-referencing them against [RubySec](https://rubysec.com/). +Like Go, Rust does not support dynamic linking. A copy of each library is included in each compiled binary. -Since build system and package management system are usually decoupled, most other languages did not have the opportunity to implement anything like this. +Unlike Go, Rust does not provide any visibility into what libraries went into compiling a given program. It is extremely challenging to hunt down every vulnerable binary in the system. -In microservice environments it is fairly typical to expose an HTTP endpoint returning the application version, see e.g. [example from Go cookbook](https://blog.kowalczyk.info/article/vEja/embedding-build-number-in-go-executable.html). However, this typically does not include versions of the dependencies. # Unresolved questions [unresolved-questions]: #unresolved-questions From 6db1964c78aea8ee2625f487cbbfcdd9fa29a2c7 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Fri, 11 Aug 2023 23:14:29 +0200 Subject: [PATCH 21/22] Add 'Why enable this by default?' section --- text/0000-cargo-embed-dependency-versions.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 9b6ca3439d7..38f2ca536be 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -139,7 +139,7 @@ Not all compilations targets support embedding this data. Whether the target sup # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -Rationale: +## Rationale - This way version information is *impossible* to misplace. As long as you have the binary, you can recover the info about dependency versions. The importance of this is impossible to overstate. This allows auditing e.g. a Docker container that you did not build yourself, or a server that somebody's set up a year ago and left no audit trail. - A malicious actor could lie about the version information. However, doing so requires modifying the binary - and if a malicious actor can do _that,_ you are pwned anyway. So this does not create any additional attack vectors other than exploiting the tool that's recovering the version information, which can be easily sandboxed. @@ -147,7 +147,17 @@ Rationale: - Tooling for extracting information from binaries (such as ELF sections) is already readily available, as are zlib decompressors and JSON parsers. It can be extracted and parsed [in 5 lines of Python](https://github.com/rust-secure-code/cargo-auditable/blob/master/PARSING.md), or even with a shell one-liner in a pinch. - This enables third parties such as cloud providers to scan your binaries for you. Google Cloud [already provides such a service](https://cloud.google.com/container-registry/docs/get-image-vulnerabilities), Amazon has [an open-source project you can deploy](https://aws.amazon.com/blogs/publicsector/detect-vulnerabilities-in-the-docker-images-in-your-applications/) while Azure [integrates several partner solutions](https://docs.microsoft.com/en-us/azure/security-center/security-center-vulnerability-assessment-recommendations). They do not support this specific format yet, but integration into Trivy was very easy, so adding support will likely be trivial. -Alternatives: +## Why enable this by default? + +If you have 10 programs on your computer that use a library exploitable over the network, your computer and all your data are vulnerable as long as even **a single program** remains vulnerable. + +It does not matter if you have discovered and patched 9 out of 10. As long as a single unpatched binary is running, the system is vulnerable. + +An attacker needs to only find a single vulnerable program; defense is all-or-nothing. **Every single** vulnerable binary needs to be found and fixed. + +If the mechanism proposed in this RFC is not enabled by default, it will not let you discover all vulnerable binaries in the system, and at that point it might as well not exist. + +## Alternatives - Do nothing. - Identifying vulnerable binaries will remain impossible. We will see increasing number of known vulnerabilities unpatched in production. From 6f64c5c275681cdd8b02a2a5c0e7bdb4da324bc2 Mon Sep 17 00:00:00 2001 From: "Sergey \"Shnatsel\" Davidoff" Date: Sat, 12 Aug 2023 11:04:20 +0200 Subject: [PATCH 22/22] More objective wording --- text/0000-cargo-embed-dependency-versions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-cargo-embed-dependency-versions.md b/text/0000-cargo-embed-dependency-versions.md index 38f2ca536be..e6cd4af3095 100644 --- a/text/0000-cargo-embed-dependency-versions.md +++ b/text/0000-cargo-embed-dependency-versions.md @@ -155,7 +155,7 @@ It does not matter if you have discovered and patched 9 out of 10. As long as a An attacker needs to only find a single vulnerable program; defense is all-or-nothing. **Every single** vulnerable binary needs to be found and fixed. -If the mechanism proposed in this RFC is not enabled by default, it will not let you discover all vulnerable binaries in the system, and at that point it might as well not exist. +If the mechanism proposed in this RFC is not enabled by default, it will not let you discover all vulnerable binaries in the system, and will not be effective at preventing vulnerabiltiy exploitation. ## Alternatives