Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate our own impl block names for consistency between ghost/erased runs of rustc #709

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Chris-Hawblitzel
Copy link
Collaborator

Previously, our VIR paths contained rustc-generated segments like "impl%0" and "impl%1", using rustc's numbering for the "impl%n" names. Unfortunately, this numbering is not necessarily consistent between the keep-ghost-code run of rustc and the erased-ghost-code run of rustc. Therefore, this pull request generates our own unique names for impl blocks independent of rustc's numbering.

The generated unique names include abbreviations of the datatype and (optionally) trait name for each impl block, so this pull request also has the side effect of making the VIR names more readable.

@Chris-Hawblitzel
Copy link
Collaborator Author

Example:

struct Cat;

impl Cat {
    spec fn f() -> int { 5 }
}

old SMT:

;; Function-Decl test::Cat::f
(declare-fun test!impl&%0.f.? (Poly) Int)

new SMT:

;; Function-Decl test::Cat::f
(declare-fun test!&%Cat.f.? (Poly) Int)

Copy link
Collaborator

@tjhance tjhance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this numbering is not necessarily consistent between the keep-ghost-code run of rustc and the erased-ghost-code run of rustc.

Why is this important?

TyKind::Bound(..) => TypTree::String(ty.to_string()),

TyKind::FnDef(..) => TypTree::String("fndef".to_string()),
TyKind::FnPtr(..) => TypTree::String("fnptr".to_string()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do FnDef and FnPtr not include type arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be unsupported_err!, as in mid_ty_to_vir_ghost. I just have to set up the spans and Result return types so that these errors would get reported properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I updated the pull request to use unsupported_err!

@Chris-Hawblitzel
Copy link
Collaborator Author

Why is this important?

If this issue happens while compiling vstd, the compiled rust library will have different names from the serialized VIR (in the persisted .vir file). This causes Verus to fail to find the expected names in the .vir file in code that uses vstd. We never saw this before because it happened that the names always matched up, but in recent experiments on the https://github.com/verus-lang/verus/tree/rustc_build branch this has actually happened (before using the fix in this PR).

@@ -66,6 +66,7 @@ const SUBST_RENAME_SEPARATOR: &str = "$$";
const KRATE_SEPARATOR: &str = "!";
const PATH_SEPARATOR: &str = ".";
const PATHS_SEPARATOR: &str = "/";
const IMPL_SEPARATOR: &str = "/";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK for the IMPL_SEPARATOR to be the same as the PATHS_SEPARATOR? the impl name can be part of a path, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I changed IMPL_SEPARATOR to "&/".

@tjhance
Copy link
Collaborator

tjhance commented Jul 27, 2023

I don't understand the function of ImplNameId. As I understand it, the ImplFingerprint is the thing that remains constant across the ghost/erased runs. The ImplNameState maps those to Idents, which I presume are the idents that end up in the VIR Path names. So what does ImplNameId do?

@Chris-Hawblitzel
Copy link
Collaborator Author

what does ImplNameId do?

I renamed this to RustcLocalImplId and added comments:

// rustc's internal name in a particular run of rustc.
// It consists of TypPath data, which is stable across runs of rustc,
// and numeric disambiguators, which can vary between runs of rustc.
// Our goal is to remap RustcLocalImplId to completely stable names.
pub(crate) type RustcLocalImplId = (TypPath, Vec<u32>);

@tjhance
Copy link
Collaborator

tjhance commented Jul 27, 2023

So it's just a stringified version of the DefPath? Is it not possible to use the DefPath or DefId as the key?

@Chris-Hawblitzel
Copy link
Collaborator Author

So it's just a stringified version of the DefPath? Is it not possible to use the DefPath or DefId as the key?

I took a closer look at this. RustcLocalImplId = (TypPath, Vec<u32>) is a stringified, hashable, serializable version of a prefix of a DefPath. I tried using RustcLocalImplId = DefPath or RustcLocalImplId = (CrateNum, Vec<DisambiguatedDefPathData>). However, these didn't implement Serialize and Deserialize.

Also, the code for def_path_to_rustc_id has a second purpose. It's used for def_path_to_path, which strips off the disambiguators to get a path that can be used in the fingerprints.

@tjhance
Copy link
Collaborator

tjhance commented Jul 28, 2023

Hm, ok, there's still something I'm missing here. Why does RustcLocalImplId need to be serializable? If the whole point is that it's not stable, what good does that do?

@Chris-Hawblitzel
Copy link
Collaborator Author

Hm, ok, there's still something I'm missing here. Why does RustcLocalImplId need to be serializable? If the whole point is that it's not stable, what good does that do?

This was also confusing for me while I was writing the code, so it's probably good to document the approach better. I added more comments describing a typical scenario:

    // Example:
    // - when compiling vstd (both with erase_ghost and !erase_ghost):
    //   - rustc creates its own internal names for impls in vstd
    //     - these names may be different in erase_ghost and !erase_ghost
    //     - example:
    //       - in erase_ghost: impl%2
    //       - in !erase_ghost: impl%3
    //   - we represent these rustc internal names with RustcLocalImplId
    //     - example:
    //       - in erase_ghost: ("impl", [2])
    //       - in !erase_ghost: ("impl", [3])
    //   - we map the RustcLocalImplId to our own Ident in map_to_stable_name
    //     - we create a mapping for erase_ghost and another mapping for !erase_ghost
    //     - the Ident values are the same in erase_ghost and !erase_ghost
    //     - example:
    //       - in erase_ghost: ("impl", [2]) -> impl_Vec7
    //       - in !erase_ghost: ("impl", [3]) -> impl_Vec7
    //   - in our own VIR AST for the vstd library (created with !erase_ghost),
    //     we use the stable Ident values
    //     - impl_Vec7
    //   - rustc emits a library file on disk using its erase_ghost internal names
    //       - in erase_ghost: impl%2
    //   - we serialize and emit our VIR AST in CrateWithMetadata
    //     - the serialized VIR AST contains impl_Vec7
    //   - we also serialize the erase_ghost mapping in CrateWithMetadata
    //     - ("impl", [2]) -> impl_Vec7
    //     - see the export_impls function below and see import_export.rs
    // - when a client application imports vstd:
    //   - rustc reads in the erase_ghost-compiled vstd library from disk
    //     - this file contains the *same* erase_ghost rustc internal names
    //       created while compiling vstd
    //       - in erase_ghost: impl%2
    //     - we again represent these internal names with the same RustcLocalImplId
    //       as we did while compiling vstd
    //       - in erase_ghost: ("impl", [2])
    //   - we read in the serialized mapping from CrateWithMetadata created when compiling vstd
    //     - in erase_ghost: ("impl", [2]) -> impl_Vec7
    //   - we use the RustcLocalImplId and deserialized map_to_stable_name to recover the same
    //     Ident values that were used while compiling vstd
    //     - impl_Vec7
    //   - the key result is that these Ident values will match the !erase_ghost Ident values
    //     that are present in the rest of the deserialized VIR AST for the vstd library
    //     (whereas if we hadn't done the remapping, we would have seen inconsistent names,
    //     impl%2 from the !erase_ghost rustc-emitted library, and impl%3 in our deserialized
    //     VIR AST)

@tjhance
Copy link
Collaborator

tjhance commented Jul 28, 2023

OK, I think I get it now. Just to check, we use the fingerprinting in order to make sure these two things map to the same Ident:

    //       - in erase_ghost: ("impl", [2]) -> impl_Vec7
    //       - in !erase_ghost: ("impl", [3]) -> impl_Vec7

But the fingerprints don't get serialized, the thing we serialize is the erase_ghost mapping, which matches the rlib file?

}

fn def_path_to_path<'tcx>(tcx: TyCtxt<'tcx>, def_path: DefPath) -> TypPath {
def_path_to_rustc_id(tcx, def_path, usize::MAX).0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assert here that the disambiguator list we're ignoring is empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assert here that the disambiguator list we're ignoring is empty?

I just tried this:

fn def_path_to_path<'tcx>(tcx: TyCtxt<'tcx>, def_path: DefPath) -> TypPath {
    let (path, disambiguators) = def_path_to_rustc_id(tcx, def_path, usize::MAX);
    assert!(disambiguators.len() == 0);
    path
}

However, this assertion fails for nested impls (which, admittedly, we don't support yet, but we could in the future, and names.rs is intended to account for this):

struct S;
impl S {
    fn f() {
        struct Q;
        impl Q {
            fn g() {}
        }
    }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I... had no idea nested impls were a thing.

Can you give a rough argument why the fingerprinting is sound even when disambiguators are thrown away?

@Chris-Hawblitzel
Copy link
Collaborator Author

we use the fingerprinting in order to make sure these two things map to the same Ident:

yes

@Chris-Hawblitzel
Copy link
Collaborator Author

the thing we serialize is the erase_ghost mapping, which matches the rlib file?

yes

@Chris-Hawblitzel
Copy link
Collaborator Author

But the fingerprints don't get serialized

Good point. The fingerprints had been unnecessarily marked Serialize, Deserialize. I just removed that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants