Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a raw reference/pointer to an extern never value claims following code is unreachable #74840

Closed
Nemo157 opened this issue Jul 27, 2020 · 49 comments · Fixed by #78324
Closed
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull C-bug Category: This is a bug. F-never_type `#![feature(never_type)]` F-raw_ref_op `#![feature(raw_ref_op)]` requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Nemo157
Copy link
Member

Nemo157 commented Jul 27, 2020

I tried this code (playground):

#![feature(never_type)]
#![feature(raw_ref_macros)]
#![feature(raw_ref_op)]

extern {
    static FOO: !;
}

fn main() {
    dbg!(unsafe { core::ptr::raw_const!(FOO) });
    dbg!(unsafe { &raw const FOO });
}

I expected to see this happen: just a linker failure because FOO doesn't exist.

Instead, this happened: before the linker failure, rustc complained that code was unreachable.

warning: unreachable expression
  --> src/main.rs:10:19
   |
10 |     dbg!(unsafe { core::ptr::raw_const!(FOO) });
   |                   ^^^^^^^^^^^^^^^^^^^^^^---^
   |                   |                     |
   |                   |                     any code following this expression is unreachable
   |                   unreachable expression

warning: unreachable expression
  --> src/main.rs:11:19
   |
11 |     dbg!(unsafe { &raw const FOO });
   |                   ^^^^^^^^^^^---
   |                   |          |
   |                   |          any code following this expression is unreachable
   |                   unreachable expression

(The code should also build without the unsafe; that is caused by the same problem.)

Meta

1.47.0-nightly
(2020-07-26 6c8927b0cf80ceee1938)

cc @rust-lang/wg-unsafe-code-guidelines

@Nemo157 Nemo157 added the C-bug Category: This is a bug. label Jul 27, 2020
@Nemo157
Copy link
Member Author

Nemo157 commented Jul 27, 2020

To step it up a notch, I then tried giving FOO an address pointing to data at least as valid as ! (playground):

#![feature(never_type)]
#![feature(raw_ref_macros)]
#![feature(raw_ref_op)]

extern {
    static FOO: !;
}

mod foo {
    #[no_mangle]
    static FOO: u32 = 5;
}

fn main() {
    dbg!(unsafe { core::ptr::raw_const!(FOO) });
    dbg!(unsafe { &raw const FOO });
}

I expected this to run and exit fine, since it is just printing the address of a symbol that definitely has an address.

Instead, it ran and then SIGILLed (after giving the same warnings as above):

     Running `target/debug/foo`
[src/main.rs:15] unsafe { core::ptr::raw_const!(FOO) } = 0x000055b8aacb1604
[src/main.rs:16] unsafe { &raw const FOO } = 0x000055b8aacb1604
[1]    662959 illegal hardware instruction (core dumped)  cargo run

@jonas-schievink jonas-schievink added F-never_type `#![feature(never_type)]` F-raw_ref_op `#![feature(raw_ref_op)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 27, 2020
@RalfJung
Copy link
Member

The relevant MIR looks like this:

        _2 = const {alloc0: &!};
        _1 = &raw const (*_2);

@oli-obk looks like creating a raw ptr to a static goes through a reference, with the new scheme? Also Cc @matthewjasper

FWIW I am not sure if this is really a bug; an explicitly uninhabited static seems at the very least highly suspicious to me. This is currently certainly not a blessed pattern. But the generated MIR is still odd.

@Nemo157
Copy link
Member Author

Nemo157 commented Jul 28, 2020

The context in which I noticed it was an attempt by @repnop at using them to pass the heap start/end addresses from the linker to a Rust program (discord convo). It seemed like a somewhat correct way to encode a symbol that has a known address, but should not be readable.

@RalfJung
Copy link
Member

RalfJung commented Jul 28, 2020

I think an inhabited ZST like () is a more appropriate way to encode that pattern. Also see the "opaque struct" section in the Nomicon.

@RalfJung
Copy link
Member

Btw I think this should be merged with #74843; both are caused by the same odd MIR generation.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 29, 2020

We're computing the type of the "address of" constant of statics in

let ty = cx.tcx.static_ptr_ty(id);
and that doesn't account for extern statics:
pub fn static_ptr_ty(&self, def_id: DefId) -> Ty<'tcx> {
// Make sure that any constants in the static's type are evaluated.
let static_ty = self.normalize_erasing_regions(ty::ParamEnv::empty(), self.type_of(def_id));
if self.is_mutable_static(def_id) {
self.mk_mut_ptr(static_ty)
} else {
self.mk_imm_ref(self.lifetimes.re_erased, static_ty)
}
}

I thought we had some more logic there a while back... oh well. We need an else if self.is_foreign_item(def_id) { self.mk_imm_ptr(static_ty) }... before the else, which should then give us exactly what we want

@oli-obk
Copy link
Contributor

oli-obk commented Jul 29, 2020

You can check this out by changing all your extern statics to be mutable, then you get the codegen you'd get with my suggested change above. Though the never type example still dies horribly, so there seems to be a bit more to the never type, even if there's never a reference to it.

EDIT: I'm not sure what's causing this, but I'm guessing the deref projection for the reborrow, as just having a constant of raw pointer to never type doesn't cause any problems: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb80e957cebf8bf982ce50f914bd9236

@RalfJung
Copy link
Member

Yeah, the problem is here already:

_2 = const {alloc0: &!};

We are embedding the const at reference type (note that &!), which is wrong.

Maybe @matthewjasper knows more about how/why this happens.

@RalfJung RalfJung added the requires-nightly This issue requires a nightly compiler in some way. label Aug 17, 2020
@oli-obk
Copy link
Contributor

oli-obk commented Aug 17, 2020

The fix is described in #74840 (comment) where you'll end up with a raw pointer in that constant. But there's some additional messiness around never types, but without doing the fix first, I don't know what it is exactly.

@oli-obk oli-obk added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Aug 17, 2020
@RalfJung
Copy link
Member

RalfJung commented Aug 18, 2020

@oli-obk I am not sure if that is the right fix, or at least not a complete one. For this code:

static FOO: T = ...;

let ptr = ptr::raw_const!(FOO);

I'd expect no reference to be created. This makes a difference when there's invalid data in FOO. (And no reference would have been created before the static-as-const-ptr change.)

@oli-obk
Copy link
Contributor

oli-obk commented Aug 18, 2020

there can't be invalid data for FOO, unless FOO is extern, and my fix suggests to make using an extern static create a raw pointer just like we do with mutable statics.

@RalfJung
Copy link
Member

RalfJung commented Aug 18, 2020

There can be invalid data. FOO could be interior mutable and somebody could cast a pointer to it to a different type, and write data that is invalid at FOO's type. Or FOO could be pub and #[link_name] or #[no_mangle] or however you export a static, and then it could be imported by some C code which can put in invalid data.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 18, 2020

I was under the impression that it's UB to have a (non extern) static that has invalid data. I thought static items and local variables had the same rules with that. Is there any discussion about this? Because it's definitely news to me.

@RalfJung
Copy link
Member

RalfJung commented Aug 18, 2020

It's the same for statics and locals. There is nothing in the UB list that says that locals or statics always have valid data. It only says you may not create invalid data. Through pointer type casts, it is possible to have invalid data in a local without ever creating invalid data. Whether we want to do something about this and if it has any negative consequences, is tracked at rust-lang/unsafe-code-guidelines#84.

Here's a small demo, where Miri cannot see any UB, and indeed none of the clauses in our UB list is met:

fn main() {
    let mut x: bool = true;
    let ptr = &mut x as *mut _ as *mut u8;
    unsafe { *ptr = 33; }
}

@oli-obk
Copy link
Contributor

oli-obk commented Aug 18, 2020

oh wow... ok. We'll need to rewrite the entire system around StaticRef in that case. I guess what we need to do is to translate ExprKind::Ref(ExprKind::Path(name_of_static)) directly instead of recursing when encountering ExprKind::Ref and translating ExprKind::Path to *&STATIC. The good thing if we do this is that we can get rid of some hacks in promotion and const checking.

@Dylan-DPC-zz
Copy link

I'm not sure if this issue classifies as "E-Easy" any more :D

@Dylan-DPC-zz Dylan-DPC-zz removed the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Aug 18, 2020
@RalfJung
Copy link
Member

Not sure if we need to rewrite the entire system... an alternative might be to always use raw pointers, but make unsafety checking recognize these derefs of const raw ptrs that point to a DefId of the right type... or maybe they are marked somehow in the MIR.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 18, 2020

an alternative might be to always use raw pointers, but make unsafety checking recognize these derefs of const raw ptrs that point to a DefId of the right type... or maybe they are marked somehow in the MIR.

Stop reading my mind 😝. I considered this, but the complexity for that would imo warrant rewriting the system and simplifying everything. Right now we need to look at a deref, check whether its local is a StaticRef local as per debug info, if it is, we need to find the last (and only) assignment to that local and look at the rhs to read the constant and process it. Adding more layers to that scheme doesn't seem too great to me. The entire reason I wrote the scheme was so we can use references and pointers to signal what can be safely accessed and what not, because I thought there was a canonical lowering to all accesses to statics.

@RalfJung
Copy link
Member

I don't think I follow. My proposed scheme should make the insertion of these pointers simpler -- just always use raw pointers. This is in exchange for a bit more complexity in the unsafety checks (as otherwise all static accesses would be considered unsafe). I am not sure if that's a good tradeoff considering how critical the safety checks are, but it doesn't match what you are saying about this being more complex during MIR construction.

@RalfJung
Copy link
Member

I like the sound of "simplify everything", but I am not sure what alternative you are thinking of.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 19, 2020

So... we have the following cases:

  • ExprKind::Ref(ExprKind::Path(path_to_static)) (this is taking a reference to a static)
  • ExprKind::AddrOf(ExprKind::Path(path_to_static)) (this is taking a raw pointer to a static)
  • ExprKind::Path(path_to_static) (this is reading from or writing to a static directly)

And right now we're translating all of them to a constant of &STATIC value (sometimes at pointer type, sometimes at reference type). We then immediately insert a deref so what we actually get is *&STATIC or *&raw STATIC. In the case of Ref or AddrOf we then proceed to take a reference again, so we get &*&STATIC or &raw *&STATIC (or either of these with a &raw STATIC). In all other cases, we read from the static in a copying manner since we can't move out of a reference or raw pointer via deref.

What we could do instead is to lower Ref directly to a &STATIC constant, AddrOf directly to a &raw STATIC constant and all other uses to a new Rvalue kind (Rvalue::Static(DefId)?) that just reads from the static.

This is simpler imo because everything is now very explicit. If we want we can also put everything into the Rvalue::Static by having another enum that decides whether this Rvalue is reading from the static, taking a reference to it or taking a pointer to it.

The only thing that stays odd are assignments to mutable statics, as they need to end up on the lhs of an assignment, but I think our THIR infrastructure will automatically make that work

@RalfJung
Copy link
Member

If we want we can also put everything into the Rvalue::Static by having another enum that decides whether this Rvalue is reading from the static, taking a reference to it or taking a pointer to it.

IMO then it would make more sense to just bring back static places.

If we can rely on such larger patterns, my proposal would be:

  • ExprKind::Ref(ExprKind::Path(path_to_static)) -> static address at reference type
  • ExprKind::AddrOf(ExprKind::Path(path_to_static)) -> static address at raw pointer type
  • ExprKind::Path(path_to_static) -> copy deref of static address at reference type

However, I am not sure if the first case is correct -- does unsafety checking still detect static mut and extern static uses as unsafe this way?

@oli-obk
Copy link
Contributor

oli-obk commented Sep 21, 2020

So... I looked a bit deeper and realized that it's not related to anything in the MIR or MIR building at all, but it's typeck which treats all diverging expression (path expression to an extern static in this case) as causing all following expressions to be unreachable, even if in the raw pointer case this is not technically true. More details can be found in #76982 (comment)

TLDR: I don't see a good reason to allow uninhabited extern statics, since their only use case is getting their address taken. So I'm proposing to instead add a future incompat lint that tells you to not use extern statics of uninhabited type and instead use a dummy type that you can safely take the address of.

@RalfJung
Copy link
Member

TLDR: I don't see a good reason to allow uninhabited extern statics, since their only use case is getting their address taken. So I'm proposing to instead add a future incompat lint that tells you to not use extern statics of uninhabited type and instead use a dummy type that you can safely take the address of.

Agreed. It could directly recommend () as said dummy type.

That said, isn't something like this still needed to make &raw const STATIC not create an intermediate reference? The only remaining reason why that is bad is that references technically are UB when they point to invalid data, raw pointers are not... so in principle we could just add a FIXME and rely on the compiler not exploiting this.

(Also static_ptr_ty still needs better comments to explain how critical that function is. I was going to add those comments once we figured what we want to do.)

@oli-obk
Copy link
Contributor

oli-obk commented Sep 21, 2020

Yea, I am currently rewriting that PR, but I wanted to separate it from the UB situation here.

@RalfJung
Copy link
Member

The reason uninhabited extern statics are so problematic is that they are the only uninhabited places that one can actually access in Rust -- normal statics cannot be uninhabited as then computing their initial value would error, and uninhabited locals cannot be accessed before being initialized which can never happen.

So, with that lense on, it makes perfect sense to just make these a hard error after some transition period. @oli-obk are you taking care of that PR or should I put it on my list?

@RalfJung
Copy link
Member

I opened #77096 for the problems the current static handling is causing, and the refactoring we might want to do.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 23, 2020

I am not working on that

@camelid camelid added the A-raw-pointers Area: raw pointers, MaybeUninit, NonNull label Oct 7, 2020
JohnTitor added a commit to JohnTitor/rust that referenced this issue Oct 21, 2020
fix static_ptr_ty for foreign statics

Cc rust-lang#74840

This does not fix that issue but fixes a problem in `static_ptr_ty` that we noticed while discussing that issue. I also added and updated a few comments. The one about `internal` locals being ignored does not seem to have been true [even in the commit that introduced it](https://github.com/rust-lang/rust/pull/44700/files#diff-ae2f3c7e2f9744f7ef43e96072b10e98d4e3fe74a3a399a3ad8a810fbe56c520R139).

r? @oli-obk
JohnTitor added a commit to JohnTitor/rust that referenced this issue Oct 21, 2020
fix static_ptr_ty for foreign statics

Cc rust-lang#74840

This does not fix that issue but fixes a problem in `static_ptr_ty` that we noticed while discussing that issue. I also added and updated a few comments. The one about `internal` locals being ignored does not seem to have been true [even in the commit that introduced it](https://github.com/rust-lang/rust/pull/44700/files#diff-ae2f3c7e2f9744f7ef43e96072b10e98d4e3fe74a3a399a3ad8a810fbe56c520R139).

r? @oli-obk
@RalfJung
Copy link
Member

I have written the code for rejecting uninhabited statics... but I realized I have no idea where to put it.^^ I first put it to the other static checks such as Sync, but realize that those are never run for extern statics...

@RalfJung
Copy link
Member

In fact this means that extern static can be !Sync, which seems... strange?

@RalfJung
Copy link
Member

I think I found a reasonable place: #78324

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Oct 25, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Oct 25, 2020
@bors bors closed this as completed in b9a94c9 Oct 26, 2020
@simonbuchan
Copy link
Contributor

simonbuchan commented Dec 1, 2020

So I just hit this warning testing some code doing interop with Objective-C (ugh). The relevant code here is approximately:

// roughly the content of the objc headers
pub enum objc_class {}
pub type Class = *const objc_class;

...

// presumably this could change to ptr::raw_const!()? 
impl NSObject {
    pub fn class() -> Class {
        extern "C" {
            #[link_name = "OBJC_CLASS_$_NSObject"]
            static CLASS: objc_class;
        }
        &NS_OBJECT_CLASS
    }
}

Essentially, using the dynamic linker to generate runtime identifiers. This is emulating what Objective-C and Swift emit to avoid (or rather optimize, since the dynamic linker is doing it) the runtime cost of looking up classes.

As far as I can tell from this thread, the issue is approximately that it's illegal to take the address of a value that can't exist (in Rust) as it's uninhabited, and the replacement is to use a valid (ie. inhabited) type? Here the issue is that objc_class should not be constructible, movable, or really have any size (zero or not) by users of the crate, and should be a distinct type from, e.g. objc_object. The constructible constraint is easy enough, but the others are a lot harder to emulate without an uninhabited type.

Would the suggestion be to change this simply to:

        extern "C" {
            #[link_name = "OBJC_CLASS_$_NSObject"]
            static CLASS: ();
        }
        (&NS_OBJECT_CLASS as *const ()).cast()

Or is there a better !Size type that objc_class and friends could wrap that wouldn't cause this issue?

@qwerty19106
Copy link

Possible you should use extern type syntax.

Add a new kind of type declaration, an extern type:

extern {
type Foo;
}

These types are FFI-safe. They are also DSTs, meaning that they do not implement Sized. Being DSTs, they cannot be kept on the stack, can only be accessed through pointers and references and cannot be moved from.

It requires Nightly now.

@oli-obk
Copy link
Contributor

oli-obk commented Dec 2, 2020

The lack of a Size is indeed problematic without extern type, but you can satisfy immovability quite easily: you just don't implement Copy for your type: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=809cdd29dbe7449c682c2020b9e89b13

If you want to make sure ppl realize the size is problematic, put a [u8; 1000*1000*1000*1000] into it. Since there's never an instance of this value, this won't blow you stack or anything unless someone tries to create a value via MaybeUninit or other unions

@RalfJung
Copy link
Member

RalfJung commented Dec 2, 2020

@simonbuchan using an uninhabited type to model an opaque type is a very bad idea, because Rust assumes that uninhabited types cannot be constructed by anyone ever. That is not what you want to express -- you want to express that the (Rust) client of your library cannot construct this type, but other parts of the code (across an FFI boundary, but that makes no difference) can construct it.

The "proper" but nightly-only solution is indeed extern { type Foo }. There's a less elegant version that works on stable described in the nomicon.

@simonbuchan
Copy link
Contributor

Thanks everyone! extern type looks great, but for now, I'll go with the zero-sized struct. Unfortunately, pointer to uninhabited enum is a bad idea with tenure: it's seems to be used everywhere as the current implementation of extern types. It seems it works fine unless you are doing exactly this though, so it's probably not too bad.

@RalfJung
Copy link
Member

RalfJung commented Dec 3, 2020

It seems it works fine unless you are doing exactly this though

It works if you only ever use it as the pointee of a raw pointer, like *mut Uninhabited. But if you ever use it with a reference type, or dereference the raw pointer (*ptr), then you have UB.

dgrunwald added a commit to dgrunwald/rust-cpython that referenced this issue Feb 17, 2021
Empty enums are uninhabited types and can cause issues like rust-lang/rust#74840

This fixes the compiler warning: static of uninhabited type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull C-bug Category: This is a bug. F-never_type `#![feature(never_type)]` F-raw_ref_op `#![feature(raw_ref_op)]` requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
8 participants