Skip to content

Commit

Permalink
Paper Submission (OSDI 25) Sprint Improvements (#170)
Browse files Browse the repository at this point in the history
## What Changed?

This PR contains a plethora of improvements and fixes performed for the
OSDI 25 submission. They should be separate PRs but I am too lazy to
make so many.

## Why Does It Need To?

This PR fixes the following problems:

- `async_trait` recognition now accommodates newer versions which
sometimes have more than 2 predicates in the `dyn`
- Writing cross-crate MIR bodies now happens after the analysis and can
be skipped. The analysis will lazily fetch and create typecheck facts
for local bodies.
- Bodies are now written in a per-crate artifact to reduce size.
Filenames contained in those artifacts are deduplicated.
- End-to-end timing (including for cross crate MIR dumping) is now
recorded as well as and new timers for dumping of MIR have been added
- For efficiency we no longer construct the polonius facts but use just
the outlives constraints directly which turn out to be sufficient
- Shim calls (for function pointers and closures) are now handled
correctly
- Fixed control flow across function boundaries. It is now emitted
correctly.
- Control flow edges from the `await` state machine are now omitted from
the PDG.
- Artifacts are written (vastly) more efficiently using buffered
readers/writers
- External marker files can now assign multiple markers to an entity in
a single entry or to multiple refinements.
- PDGs are now always constructed for bodies that handle marked types
- Emitting the spans of analyzed bodies has been reinstated
- `Context` has been renamed to `RootContext` and a new `Context` trait
has been introduced that contains some of the simple graph queries.

## Checklist

- [x] Above description has been filled out so that upon quash merge we
have a
  good record of what changed.
- [x] New functions, methods, types are documented. Old documentation is
updated
  if necessary
- [ ] Documentation in Notion has been updated
- [ ] Tests for new behaviors are provided
  - [ ] New test suites (if any) ave been added to the CI tests (in
`.github/workflows/rust.yml`) either as compiler test or integration
test.
*Or* justification for their omission from CI has been provided in this
PR
    description.
  • Loading branch information
JustusAdam authored Dec 19, 2024
1 parent 52a64bd commit 0a5c628
Show file tree
Hide file tree
Showing 59 changed files with 2,013 additions and 576 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ target/

/rust-dfpp-patch
flow-graph.json
flow-graph.stat.json

# Local cargo configuration
.cargo/config.toml
Expand Down
4 changes: 3 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ rustc_plugin = "=0.7.4-nightly-2023-08-25"
[workspace.dependencies.flowistry]
# path = "../flowistry/crates/flowistry"
git = "https://github.com/brownsys/flowistry"
rev = "08c4ad9587b3251a8f7c64aa60be31404e6e04c0"
rev = "57627ed24802cb76c745118bbc46f83f402a1e88"
default-features = false


Expand Down
6 changes: 3 additions & 3 deletions Makefile.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# - `fix-all` tries to fix all the formatting and code style issues. This is
# most likely the command you want to run.
# - `check-all` runs all the formatting and code-style checks we also run in CI
# This performs all teh same operations as `fix-ll` but complains instead of
# This performs all the same operations as `fix-ll` but complains instead of
# solving them.
# - `pdg-tests` which runs our current complete test suite
#
Expand All @@ -20,7 +20,7 @@ default_to_workspace = false

[tasks.install]
command = "cargo"
args = ["install", "--locked", "--path", "crates/paralegal-flow"]
args = ["install", "--locked", "-f", "--path", "crates/paralegal-flow"]

[tasks.ci-tests]
description = "The suite of tests we run in CI"
Expand All @@ -35,7 +35,7 @@ description = """\
Attempt to perform any formatting and code structure related fixes.
Note: This uses the clippy-fix command under the hood. This can introduce
breaking changes and so this tscript only passes `--allow-staged` to the
breaking changes and so this script only passes `--allow-staged` to the
fix command. What this means for you is that you need to stage (or commit)
your changes in git for the fix command to work. This is a precaution that
allows you to inspect, test and potentially reject clippy's changes before
Expand Down
3 changes: 0 additions & 3 deletions crates/flowistry_pdg_construction/src/async_support.rs
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,6 @@ fn match_pin_box_dyn_ty(lang_items: &rustc_hir::LanguageItems, t: ty::Ty) -> boo
let ty::TyKind::Dynamic(pred, _, ty::DynKind::Dyn) = t_a.boxed_ty().kind() else {
return false;
};
if pred.len() != 2 {
return false;
}
pred.iter().any(|p| {
let ty::ExistentialPredicate::Trait(t) = p.skip_binder() else {
return false;
Expand Down
119 changes: 79 additions & 40 deletions crates/flowistry_pdg_construction/src/body_cache.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
use std::path::PathBuf;
use std::{
cell::RefCell,
path::PathBuf,
time::{Duration, Instant},
};

use flowistry::mir::FlowistryInput;

use polonius_engine::FactTypes;
use rustc_borrowck::consumers::{ConsumerOptions, RustcFacts};
use rustc_hash::FxHashMap;
use rustc_hir::{
def_id::{CrateNum, DefId, LocalDefId, LOCAL_CRATE},
def_id::{CrateNum, DefId, DefIndex, LocalDefId, LOCAL_CRATE},
intravisit::{self},
};
use rustc_macros::{Decodable, Encodable, TyDecodable, TyEncodable};
Expand Down Expand Up @@ -36,33 +41,40 @@ impl<'tcx> CachedBody<'tcx> {
let mut body_with_facts = rustc_borrowck::consumers::get_body_with_borrowck_facts(
tcx,
local_def_id,
ConsumerOptions::PoloniusInputFacts,
ConsumerOptions::RegionInferenceContext,
);

clean_undecodable_data_from_body(&mut body_with_facts.body);

Self {
body: body_with_facts.body,
input_facts: FlowistryFacts {
subset_base: body_with_facts.input_facts.unwrap().subset_base,
subset_base: body_with_facts
.region_inference_context
.outlives_constraints()
.map(|constraint| (constraint.sup, constraint.sub))
.collect(),
},
}
}
}

impl<'tcx> FlowistryInput<'tcx> for &'tcx CachedBody<'tcx> {
impl<'tcx> FlowistryInput<'tcx, 'tcx> for &'tcx CachedBody<'tcx> {
fn body(self) -> &'tcx Body<'tcx> {
&self.body
}

fn input_facts_subset_base(
self,
) -> &'tcx [(
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Point,
)] {
&self.input_facts.subset_base
) -> Box<
dyn Iterator<
Item = (
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Origin,
),
> + 'tcx,
> {
Box::new(self.input_facts.subset_base.iter().copied())
}
}

Expand All @@ -73,48 +85,70 @@ pub struct FlowistryFacts {
pub subset_base: Vec<(
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Origin,
<RustcFacts as FactTypes>::Point,
)>,
}

pub type LocationIndex = <RustcFacts as FactTypes>::Point;

type BodyMap<'tcx> = FxHashMap<DefIndex, CachedBody<'tcx>>;

/// Allows loading bodies from previosly written artifacts.
///
/// Ensure this cache outlives any flowistry analysis that is performed on the
/// bodies it returns or risk UB.
pub struct BodyCache<'tcx> {
tcx: TyCtxt<'tcx>,
cache: Cache<DefId, CachedBody<'tcx>>,
cache: Cache<CrateNum, BodyMap<'tcx>>,
local_cache: Cache<DefIndex, CachedBody<'tcx>>,
timer: RefCell<Duration>,
}

impl<'tcx> BodyCache<'tcx> {
pub fn new(tcx: TyCtxt<'tcx>) -> Self {
Self {
tcx,
cache: Default::default(),
local_cache: Default::default(),
timer: RefCell::new(Duration::ZERO),
}
}

pub fn timer(&self) -> Duration {
*self.timer.borrow()
}

/// Serve the body from the cache or read it from the disk.
///
/// Returns `None` if the policy forbids loading from this crate.
pub fn get(&self, key: DefId) -> &'tcx CachedBody<'tcx> {
let cbody = self.cache.get(key, |_| load_body_and_facts(self.tcx, key));
let body = if let Some(local) = key.as_local() {
self.local_cache.get(local.local_def_index, |_| {
let start = Instant::now();
let res = CachedBody::retrieve(self.tcx, local);
*self.timer.borrow_mut() += start.elapsed();
res
})
} else {
self.cache
.get(key.krate, |_| load_body_and_facts(self.tcx, key.krate))
.get(&key.index)
.expect("Invariant broken, body for this is should exist")
};

// SAFETY: Theoretically this struct may not outlive the body, but
// to simplify lifetimes flowistry uses 'tcx anywhere. But if we
// actually try to provide that we're risking race conditions
// (because it needs global variables like MIR_BODIES).
//
// So until we fix flowistry's lifetimes this is good enough.
unsafe { std::mem::transmute(cbody) }
unsafe { std::mem::transmute(body) }
}
}

/// A visitor to collect all bodies in the crate and write them to disk.
struct DumpingVisitor<'tcx> {
tcx: TyCtxt<'tcx>,
target_dir: PathBuf,
targets: Vec<LocalDefId>,
}

/// Some data in a [Body] is not cross-crate compatible. Usually because it
Expand Down Expand Up @@ -150,20 +184,7 @@ impl<'tcx> intravisit::Visitor<'tcx> for DumpingVisitor<'tcx> {
_: rustc_span::Span,
local_def_id: rustc_hir::def_id::LocalDefId,
) {
let to_write = CachedBody::retrieve(self.tcx, local_def_id);

let dir = &self.target_dir;
let path = dir.join(
self.tcx
.def_path(local_def_id.to_def_id())
.to_filename_friendly_no_crate(),
);

if !dir.exists() {
std::fs::create_dir(dir).unwrap();
}

encode_to_file(self.tcx, path, &to_write);
self.targets.push(local_def_id);

intravisit::walk_fn(
self,
Expand All @@ -179,14 +200,30 @@ impl<'tcx> intravisit::Visitor<'tcx> for DumpingVisitor<'tcx> {
/// calculating the necessary borrowcheck facts to store for later points-to
/// analysis.
///
/// Ensure this gets called early in the compiler before the unoptimmized mir
/// Ensure this gets called early in the compiler before the unoptimized mir
/// bodies are stolen.
pub fn dump_mir_and_borrowck_facts(tcx: TyCtxt) {
pub fn dump_mir_and_borrowck_facts<'tcx>(tcx: TyCtxt<'tcx>) -> (Duration, Duration) {
let mut vis = DumpingVisitor {
tcx,
target_dir: intermediate_out_dir(tcx, INTERMEDIATE_ARTIFACT_EXT),
targets: vec![],
};
tcx.hir().visit_all_item_likes_in_crate(&mut vis);

let tc_start = Instant::now();
let bodies: BodyMap<'tcx> = vis
.targets
.iter()
.map(|local_def_id| {
let to_write = CachedBody::retrieve(tcx, *local_def_id);

(local_def_id.local_def_index, to_write)
})
.collect();
let tc_time = tc_start.elapsed();
let dump_time = Instant::now();
let path = intermediate_out_dir(tcx, INTERMEDIATE_ARTIFACT_EXT);
encode_to_file(tcx, path, &bodies);
(tc_time, dump_time.elapsed())
}

const INTERMEDIATE_ARTIFACT_EXT: &str = "bwbf";
Expand All @@ -206,16 +243,18 @@ pub fn local_or_remote_paths(krate: CrateNum, tcx: TyCtxt, ext: &str) -> Vec<Pat
}

/// Try to load a [`CachedBody`] for this id.
fn load_body_and_facts(tcx: TyCtxt<'_>, def_id: DefId) -> CachedBody<'_> {
let paths = local_or_remote_paths(def_id.krate, tcx, INTERMEDIATE_ARTIFACT_EXT);
fn load_body_and_facts(tcx: TyCtxt<'_>, krate: CrateNum) -> BodyMap<'_> {
let paths = local_or_remote_paths(krate, tcx, INTERMEDIATE_ARTIFACT_EXT);
for path in &paths {
let path = path.join(tcx.def_path(def_id).to_filename_friendly_no_crate());
if let Ok(data) = decode_from_file(tcx, path) {
return data;
};
if !path.exists() {
continue;
}

let data = decode_from_file(tcx, path).unwrap();
return data;
}

panic!("No facts for {def_id:?} found at any path tried: {paths:?}");
panic!("No facts for {krate:?} found at any path tried: {paths:?}");
}

/// Create the name of the file in which to store intermediate artifacts.
Expand Down
31 changes: 23 additions & 8 deletions crates/flowistry_pdg_construction/src/calling_convention.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@ use rustc_middle::{
ty::TyCtxt,
};

use crate::{async_support::AsyncInfo, local_analysis::CallKind, utils};
use crate::{
async_support::AsyncInfo,
local_analysis::CallKind,
utils::{self, ShimType},
};

/// Describes how the formal parameters of a given function call relate to the
/// actual parameters.
Expand All @@ -21,7 +25,7 @@ pub enum CallingConvention<'tcx> {
/// tuple that contains the actual argument to the call of the closure
/// function.
Indirect {
once_shim: bool,
shim: Option<ShimType>,
closure_arg: Operand<'tcx>,
tupled_arguments: Operand<'tcx>,
},
Expand Down Expand Up @@ -72,8 +76,8 @@ impl<'tcx> CallingConvention<'tcx> {
match kind {
CallKind::AsyncPoll(poll) => CallingConvention::Async(poll.generator_data),
CallKind::Direct => CallingConvention::Direct(args.into()),
CallKind::Indirect { once_shim } => CallingConvention::Indirect {
once_shim: *once_shim,
CallKind::Indirect { shim: once_shim } => CallingConvention::Indirect {
shim: *once_shim,
closure_arg: args[0].clone(),
tupled_arguments: args[1].clone(),
},
Expand Down Expand Up @@ -184,13 +188,24 @@ impl<'a, 'tcx> PlaceTranslator<'a, 'tcx> {
// Map closure captures to the first argument.
// Map formal parameters to the second argument.
CallingConvention::Indirect {
once_shim,
shim,
closure_arg,
tupled_arguments,
} => {
if child.local.as_usize() == 1 {
// Accounting fot FnPtrShim
//
// The shim gets an extra first argument (the function pointer)
// but we replace it with the function iself which doesn't have
// that argument, so we need to adjust the indices
let local = if matches!(shim, Some(ShimType::FnPtr)) && child.local != RETURN_PLACE
{
child.local + 1
} else {
child.local
};
if local.as_usize() == 1 {
// Accounting for shims
let next_idx = if *once_shim {
let next_idx = if matches!(shim, Some(ShimType::Once)) {
// If this is a once shim then the signature of the
// function and its call don't match fully. (We are
// calling a closure that takes it's `self` by reference
Expand All @@ -214,7 +229,7 @@ impl<'a, 'tcx> PlaceTranslator<'a, 'tcx> {
} else {
let tuple_arg = tupled_arguments.place()?;
let _projection = child.projection.to_vec();
let field = FieldIdx::from_usize(child.local.as_usize() - 2);
let field = FieldIdx::from_usize(local.as_usize() - 2);
let field_ty = tuple_arg
.ty(self.parent_body, self.tcx)
.field_ty(self.tcx, field);
Expand Down
Loading

0 comments on commit 0a5c628

Please sign in to comment.