Skip to content

Commit

Permalink
Handling of foreign functions that may execute local code (#164)
Browse files Browse the repository at this point in the history
## What Changed?

- Adds checks which ensure that if foreign functions may call local
code, it is sound, e.g. the code they can call cannot attach markers.
- Adds a basic flow-model whitelist, which can be used to replace
foreign calls with specific methods of the traits they are parameterized
over.

## Why Does It Need To?

The main use case here is if the local code calls a higher-order
function, then we should ensure that we don't miss any markers as a
result. Here is a simple example from out test cases.

```rs
#[paralegal::marker(attached, return)]
fn attach() -> usize {
    0
}

#[paralegal::analyze]
fn main() {
    std::thread::spawn(|| attach());
}
```

Our analysis would miss the `attached` marker, because it doesn't know
the body of `thread::spawn` (and even if it did, it wouldn't help but
more on that later). But there is a more general pattern here, which is
if any foreign function `foo` is parameterized by a trait `T` which we
locally implement, then a method on that trait might attach markers.

So step one is to reject such calls to foreign function. If we did that
for all of them however that would cause a lot of false rejection.
Instead we should only reject this if the trait implementation that it
might target is known to attach a marker and therefore cause an
unsoundness issue.

But, as mentioned before we wouldn't be able to actually analyze
`thread::spawn`, even if we had access to be body. Underneath it does
some low-level stuff that Paralegal can't actually handle. However
semantically it's quite simple we could basically replace
`thread::spawn(foo)` with `foo()` (and `join` with `identity`).

So step two is to add an interface to whitelist a set of commonly used
higher order function and define some simple rewrite rules, e.g.
`thread::spawn(foo) => foo()`.

## Checklist

- [x] Above description has been filled out so that upon quash merge we
have a
  good record of what changed.
- [x] New functions, methods, types are documented. Old documentation is
updated
  if necessary
- [ ] Documentation in Notion has been updated
- [x] Tests for new behaviors are provided
  - [x] New test suites (if any) ave been added to the CI tests (in
`.github/workflows/rust.yml`) either as compiler test or integration
test.
*Or* justification for their omission from CI has been provided in this
PR
    description.
  • Loading branch information
JustusAdam authored Aug 27, 2024
1 parent a4ca7a0 commit df19b1e
Show file tree
Hide file tree
Showing 27 changed files with 1,063 additions and 264 deletions.
8 changes: 4 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,9 @@ debug = true
[replace."rustc_utils:0.7.4-nightly-2023-08-25"]
# path = "../rustc_plugin/crates/rustc_utils"
git = "https://github.com/JustusAdam/rustc_plugin"
rev = "beca0fdda699591052445a1da840c072fffa1f21"
rev = "d7bde3ae16a7137de594c2dd811030cd761a8993"

[replace."rustc_plugin:0.7.4-nightly-2023-08-25"]
# path = "../rustc_plugin/crates/rustc_plugin"
git = "https://github.com/JustusAdam/rustc_plugin"
rev = "beca0fdda699591052445a1da840c072fffa1f21"
rev = "d7bde3ae16a7137de594c2dd811030cd761a8993"
2 changes: 1 addition & 1 deletion crates/flowistry_pdg_construction/src/body_cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ impl<'tcx> BodyCache<'tcx> {
/// Serve the body from the cache or read it from the disk.
///
/// Returns `None` if the policy forbids loading from this crate.
pub fn get(&self, key: DefId) -> Option<&'tcx CachedBody<'tcx>> {
pub fn get(&self, key: DefId) -> &'tcx CachedBody<'tcx> {
let cbody = self.cache.get(key, |_| load_body_and_facts(self.tcx, key));
// SAFETY: Theoretically this struct may not outlive the body, but
// to simplify lifetimes flowistry uses 'tcx anywhere. But if we
Expand Down
52 changes: 37 additions & 15 deletions crates/flowistry_pdg_construction/src/callback.rs
Original file line number Diff line number Diff line change
@@ -1,44 +1,53 @@
//! CAllbacks to influence graph construction and their supporting types.
//! Callbacks to influence graph construction and their supporting types.
use flowistry_pdg::{rustc_portable::Location, CallString};
use rustc_middle::ty::Instance;

use rustc_middle::{
mir::{self, Operand},
ty::{Instance, ParamEnv},
};
use rustc_span::Span;

use crate::calling_convention::CallingConvention;

pub trait CallChangeCallback<'tcx> {
fn on_inline(&self, info: CallInfo<'tcx>) -> CallChanges;
fn on_inline(&self, info: CallInfo<'tcx, '_>) -> CallChanges<'tcx>;

fn on_inline_miss(
&self,
_resolution: Instance<'tcx>,
_param_env: ParamEnv<'tcx>,
_loc: Location,
_under_analysis: Instance<'tcx>,
_call_string: Option<CallString>,
_reason: InlineMissReason,
_call_span: Span,
) {
}
}

pub struct CallChangeCallbackFn<'tcx> {
f: Box<dyn Fn(CallInfo<'tcx>) -> CallChanges + 'tcx>,
f: Box<dyn Fn(CallInfo<'tcx, '_>) -> CallChanges<'tcx> + 'tcx>,
}

impl<'tcx> CallChangeCallbackFn<'tcx> {
pub fn new(f: impl Fn(CallInfo<'tcx>) -> CallChanges + 'tcx) -> Self {
pub fn new(f: impl Fn(CallInfo<'tcx, '_>) -> CallChanges<'tcx> + 'tcx) -> Self {
Self { f: Box::new(f) }
}
}

impl<'tcx> CallChangeCallback<'tcx> for CallChangeCallbackFn<'tcx> {
fn on_inline(&self, info: CallInfo<'tcx>) -> CallChanges {
fn on_inline(&self, info: CallInfo<'tcx, '_>) -> CallChanges<'tcx> {
(self.f)(info)
}
}

#[derive(Debug)]
pub enum InlineMissReason {
Async(String),
TraitMethod,
}

impl Default for CallChanges {
impl<'tcx> Default for CallChanges<'tcx> {
fn default() -> Self {
CallChanges {
skip: SkipCall::NoSkip,
Expand All @@ -47,7 +56,7 @@ impl Default for CallChanges {
}

/// Information about the function being called.
pub struct CallInfo<'tcx> {
pub struct CallInfo<'tcx, 'mir> {
/// The potentially-monomorphized resolution of the callee.
pub callee: Instance<'tcx>,

Expand All @@ -60,29 +69,42 @@ pub struct CallInfo<'tcx> {

/// Would the PDG for this function be served from the cache.
pub is_cached: bool,

pub span: Span,

pub arguments: &'mir [Operand<'tcx>],

pub caller_body: &'mir mir::Body<'tcx>,
pub param_env: ParamEnv<'tcx>,
}

/// User-provided changes to the default PDG construction behavior for function calls.
///
/// Construct [`CallChanges`] via [`CallChanges::default`].
#[derive(Debug)]
pub struct CallChanges {
pub(crate) skip: SkipCall,
pub struct CallChanges<'tcx> {
pub(crate) skip: SkipCall<'tcx>,
}

/// Whether or not to skip recursing into a function call during PDG construction.
#[derive(Debug)]
pub enum SkipCall {
pub enum SkipCall<'tcx> {
/// Skip the function, and perform a modular approxmation of its effects.
Skip,

/// Recurse into the function as normal.
NoSkip,

/// Replace with a call to this other function and arguments.
Replace {
instance: Instance<'tcx>,
calling_convention: CallingConvention<'tcx>,
},
}

impl CallChanges {
/// Inidicate whether or not to skip recursing into the given function.
pub fn with_skip(self, skip: SkipCall) -> Self {
impl<'tcx> CallChanges<'tcx> {
/// Indicate whether or not to skip recursing into the given function.
pub fn with_skip(self, skip: SkipCall<'tcx>) -> Self {
CallChanges { skip }
}
}
23 changes: 13 additions & 10 deletions crates/flowistry_pdg_construction/src/calling_convention.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
use std::borrow::Cow;

use flowistry_pdg::rustc_portable::DefId;
use log::trace;
use rustc_abi::FieldIdx;
Expand All @@ -9,26 +11,27 @@ use rustc_middle::{

use crate::{async_support::AsyncInfo, local_analysis::CallKind, utils};

pub enum CallingConvention<'tcx, 'a> {
Direct(&'a [Operand<'tcx>]),
#[derive(Debug)]
pub enum CallingConvention<'tcx> {
Direct(Box<[Operand<'tcx>]>),
Indirect {
closure_arg: &'a Operand<'tcx>,
tupled_arguments: &'a Operand<'tcx>,
closure_arg: Operand<'tcx>,
tupled_arguments: Operand<'tcx>,
},
Async(Place<'tcx>),
}

impl<'tcx, 'a> CallingConvention<'tcx, 'a> {
impl<'tcx> CallingConvention<'tcx> {
pub fn from_call_kind(
kind: &CallKind<'tcx>,
args: &'a [Operand<'tcx>],
) -> CallingConvention<'tcx, 'a> {
args: Cow<'_, [Operand<'tcx>]>,
) -> CallingConvention<'tcx> {
match kind {
CallKind::AsyncPoll(poll) => CallingConvention::Async(poll.generator_data),
CallKind::Direct => CallingConvention::Direct(args),
CallKind::Direct => CallingConvention::Direct(args.into()),
CallKind::Indirect => CallingConvention::Indirect {
closure_arg: &args[0],
tupled_arguments: &args[1],
closure_arg: args[0].clone(),
tupled_arguments: args[1].clone(),
},
}
}
Expand Down
39 changes: 22 additions & 17 deletions crates/flowistry_pdg_construction/src/construct.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use std::rc::Rc;
use std::{borrow::Cow, rc::Rc};

use either::Either;
use flowistry::mir::FlowistryInput;
Expand Down Expand Up @@ -99,21 +99,23 @@ impl<'tcx> MemoPdgConstructor<'tcx> {
)
.unwrap();

self.construct_for(resolution).unwrap()
self.construct_for(resolution)
.expect("Invariant broken, entrypoint cannot have been recursive.")
}

/// Construct a graph for this instance of return it from the cache.
///
/// Returns `None` if this is a recursive call trying to construct the graph again.
pub(crate) fn construct_for<'a>(
&'a self,
resolution: Instance<'tcx>,
) -> Option<&'a PartialGraph<'tcx>> {
self.pdg_cache
.try_retrieve(resolution, |_| {
let g = LocalAnalysis::new(self, resolution)?.construct_partial();
trace!("Computed new for {resolution:?}");
g.check_invariants();
Some(g)
})
.as_success()
self.pdg_cache.get_maybe_recursive(resolution, |_| {
let g = LocalAnalysis::new(self, resolution).construct_partial();
trace!("Computed new for {resolution:?}");
g.check_invariants();
g
})
}

/// Has a PDG been constructed for this instance before?
Expand All @@ -130,7 +132,7 @@ impl<'tcx> MemoPdgConstructor<'tcx> {
if let Some((generator, loc, _ty)) = determine_async(
self.tcx,
function.to_def_id(),
self.body_cache.get(function.to_def_id()).unwrap().body(),
self.body_cache.get(function.to_def_id()).body(),
) {
// TODO remap arguments

Expand All @@ -152,7 +154,7 @@ impl<'tcx> MemoPdgConstructor<'tcx> {
/// Try to retrieve or load a body for this id.
///
/// Returns `None` if the loading policy forbids loading from this crate.
pub fn body_for_def_id(&self, key: DefId) -> Option<&'tcx CachedBody<'tcx>> {
pub fn body_for_def_id(&self, key: DefId) -> &'tcx CachedBody<'tcx> {
self.body_cache.get(key)
}

Expand Down Expand Up @@ -259,8 +261,8 @@ impl<'mir, 'tcx> ResultsVisitor<'mir, 'tcx, LocalAnalysisResults<'tcx, 'mir>>
if matches!(
constructor.determine_call_handling(
location,
func,
args,
Cow::Borrowed(func),
Cow::Borrowed(args),
terminator.source_info.span
),
Some(CallHandling::Ready { .. })
Expand Down Expand Up @@ -331,9 +333,12 @@ impl<'tcx> PartialGraph<'tcx> {
function: constructor.def_id,
};

let Some(handling) =
constructor.determine_call_handling(location, func, args, terminator.source_info.span)
else {
let Some(handling) = constructor.determine_call_handling(
location,
Cow::Borrowed(func),
Cow::Borrowed(args),
terminator.source_info.span,
) else {
return false;
};

Expand Down
2 changes: 1 addition & 1 deletion crates/flowistry_pdg_construction/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ use rustc_middle::ty::TyCtxt;
mod approximation;
mod async_support;
pub mod body_cache;
mod calling_convention;
pub mod calling_convention;
mod construct;
pub mod encoder;
pub mod graph;
Expand Down
Loading

0 comments on commit df19b1e

Please sign in to comment.