Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

effects: taint :noub for memory operations #52186

Merged
merged 1 commit into from
Nov 17, 2023
Merged

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Nov 16, 2023

Currently our effects system does not track :noub about memory operations. While this sounds unsafe, it's actually not problematic in most contexts. We restrict concrete-evaluation for anything involing memory operations anyway, as mutable values aren't propagated as compile-time constants (except Symbols). However, it turns out that this is actually causing very dangerous situations in REPL completion context, where REPLInterpreter enables aggressive concrete-evaluation that propagates mutable values as constants and ignores the :consisten-cy requirement that is usually tainted by memory operations.

This commit addresses this issue by tainting :noub for memory operations. This commit ends up being somewhat extensive because it would cause significant regression in REPL completions if we make memory ops naively taint :noub. Complicating this further is the Base's uses of boundscheck=false for peak performance of core memory operations, where bounds checking is conducted separately before performing actual memory operations, e.g.:

essentials.jl

function getindex(A::Array, i::Int)
    @boundscheck ult_int(bitcast(UInt, sub_int(i, 1)), bitcast(UInt, length(A))) || throw_boundserror(A, (i,))
    memoryrefget(memoryref(getfield(A, :ref), i, false), :not_atomic, false)
end

Here boundscheck=false should generally taint :noub immediately, but in this kind of case :noub can actually be NOUB_IF_NOINBOUNDS since it has @boundscheck that asserts safety of the memory ops. Note that we could employ a similar technique to getfield if we change the above implementation to something like:

function getindex(A::Array, i::Int)
    memoryrefget(memoryref(getfield(A, :ref), i, @_boundscheck), :not_atomic, @_boundscheck)
end

although this would end up being slower (check the difference at the code LLVM emits on each implementation if interested).

To this end, this commit also introduces new :noub_if_noinbounds setting to @assume_effects and use it within the base implementation. Now UInt8 is not enough to represent EffectsOverride information, the bit is enlarged to UInt16. While this approach might not be ideal, I don't think there is a better way to do this. An alternative I tried before making this commit was to introduce new semantics for Expr(:boundscheck, ...) where Expr(:boundscheck, nothing) will be constant folded to false during codegen while during inference time it does not taint :noub. I felt this further complicates the already complex complext :boundscheck situation, so I decided to go with the approach to expand the settings set of @assume_effects.

@aviatesk aviatesk requested review from vtjnash and Keno November 16, 2023 10:49
@vtjnash
Copy link
Member

vtjnash commented Nov 16, 2023

Note that we could employ a similar technique to getfield if we change the above implementation to something like:

Just a minor correction that this would not be correct. It would be safe, but it would permit OOB accesses for the Vector without accurate boundschecking

Base automatically changed from avi/memoryop-tfunc-robustness to master November 16, 2023 16:51
@topolarity
Copy link
Member

Thanks @aviatesk !

Here's a small MWE whose segfault appears to be cleaned up by this change:

julia> errs1 = rand(500)
julia> partialsortperm(errs1, 1:2; rev=true)[<tab>

On my machine, this segfaults on master but is resolved by this PR.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

@Keno Keno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach LGTM

Currently our effects system does not track `:noub` about memory
operations. While this sounds unsafe, it's actually not problematic
in most contexts. We restrict concrete-evaluation for anything involing
memory operations anyway, as mutable values aren't propagated as
compile-time constants (except `Symbol`s). However, it turns out that
this is actually causing very dangerous situations in REPL completion
context, where `REPLInterpreter` enables aggressive concrete-evaluation
that propagates mutable values as constants and ignores the
`:consisten`-cy requirement that is usually tainted by memory
operations.

This commit addresses this issue by tainting `:noub` for memory
operations. This commit ends up being somewhat extensive because it
would cause significant regression in REPL completions if we make memory
ops naively taint `:noub`. Complicating this further is the Base's uses
of `boundscheck=false` for peak performance of core memory operations,
where bounds checking is conducted separately before performing actual
memory operations, e.g.:
> essentials.jl
```julia
function getindex(A::Array, i::Int)
    @BoundsCheck ult_int(bitcast(UInt, sub_int(i, 1)), bitcast(UInt, length(A))) || throw_boundserror(A, (i,))
    memoryrefget(memoryref(getfield(A, :ref), i, false), :not_atomic, false)
end
```
Here `boundscheck=false` should generally taint `:noub` immediately, but
in this kind of case `:noub` can actually be `NOUB_IF_NOINBOUNDS` since
it has `@boundscheck` that asserts safety of the memory ops. Note
that we could employ a similar technique to `getfield` if we change the
above implementation to something like:
```julia
function getindex(A::Array, i::Int)
    memoryrefget(memoryref(getfield(A, :ref), i, @BoundsCheck), :not_atomic, @BoundsCheck)
end
```
although this would end up being slower (check the difference at the
code LLVM emits on each implementation if interested).

To this end, this commit also introduces new `:noub_if_noinbounds`
setting to `@assume_effects` and use it within the base implementation.
Now `UInt8` is not enough to represent `EffectsOverride` information,
the bit is enlarged to `UInt16`. While this approach might not be ideal,
I don't think there is a better way to do this. An alternative I
tried before making this commit was to introduce new semantics for
`Expr(:boundscheck, ...)` where `Expr(:boundscheck, nothing)` will be
constant folded to `false` during codegen while during inference time
it does not taint `:noub`. I felt this further complicates the already
complex complext `:boundscheck` situation, so I decided to go with the
approach to expand the settings set of `@assume_effects`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants