Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do function pointers behave like data pointers (wrt provenance and other aspects)? #340

Open
RalfJung opened this issue May 24, 2022 · 8 comments
Labels
A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) C-open-question Category: An open question that we should revisit

Comments

@RalfJung
Copy link
Member

RalfJung commented May 24, 2022

Miri currently treats fn ptrs and data ptry very similarly, in particular with regards to provenance. When calling a function pointer, its provenance is consulted to identify which function to invoke. This makes int2fnptr transmutes a problem (see rust-lang/rust#97321). fnptr2int transmutes are also UB because fn ptrs carry provenance which integers must not.

However, the trouble with provenance for data pointers come from multiple pointers with the same address but different provenance. Function pointers can't be offset and don't have aliasing restrictions or a "one-past-the-end" rule, so none of this applies. Hence we potentially could make them not carry provenance, and we could do the mapping from pointer to function without its provenance (basically, doing the int2ptr cast at the time the call is made).

Beyond these formal details, there are pragmatic concerns on niche architectures, such as whether data and function pointers even have the same size and representation.

Also see this Zulip discussion.

@RalfJung RalfJung added A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) C-open-question Category: An open question that we should revisit labels May 24, 2022
@thomcc
Copy link
Member

thomcc commented May 24, 2022

It would be pretty nice if they don't, as for a while now we've taught that the way to invert the fn_ptr as usize cast is to transmute back. Given that the justification just seems to be simplification of the model (which doesn't really matter to most programmers, especially given that from their perspective, the incantation to do the inverse cast is less simple now), I think it should be defined.

@RalfJung
Copy link
Member Author

RalfJung commented May 25, 2022

for a while now we've taught that the way to invert the fn_ptr as usize cast is to transmute back.

Have we?
As far as I found in the stdlib docs, no such pattern is taught there (at least nothing that doctests would cover), and the transmute docs actually cast to a raw ptr first.

Given that the justification just seems to be simplification of the model (which doesn't really matter to most programmers, especially given that from their perspective, the incantation to do the inverse cast is less simple now), I think it should be defined.

That's fair. I think a large part of programmers that write unsafe code also want to understand the model, so we shouldn't make it more complicated than absolutely necessary. However, the majority of programmers will probably never look at the model so there also is value is making it "do the expected thing", and it is worth spending some complexity on that. So to me this depends on how complicated we have to go to support this.

I would also like to distinguish the two directions:

  • fnptr2int transmutes are currently UB because it is UB to put data with provenance into an integer. This, I think, is tricky to fix as we'd have to figure out to to make sure fn ptrs are created without provenance. Also I have not seen any such transmute in the wild, since casts actually work here.
  • int2fnptr transmutes are UB because when calling the fnptr, we use provenance to determine which function to call. The alternative here is to ignore the provenance and use the absolute address to look up the allocation at the given point (basically, as if a cast had been done at the moment of the call).

@digama0
Copy link

digama0 commented May 25, 2022

  • fnptr2int transmutes are currently UB because it is UB to put data with provenance into an integer. This, I think, is tricky to fix as we'd have to figure out to to make sure fn ptrs are created without provenance. Also I have not seen any such transmute in the wild, since casts actually work here.

This isn't just about fnptr2int transmutes so maybe I should bring this up somewhere else, but I really think it is a bad idea to make these transmutes UB instead of simply stripping the provenance. (That is, when converting bytes with provenance to a value of integer type, the provenance is lost, and when it is saved again the provenance is not recovered.) I see no gain in making it immediate UB.

@RalfJung
Copy link
Member Author

That is basically #286. Though that thread is so huge now, not sure how useful it still is...

It's definitely off-topic for this thread though. :)

@KamilaBorowska
Copy link

I think function pointers on CHERI do have provenance (you cannot simply make up function pointers from integers) which is an argument supporting function pointers having provenance.

@RalfJung RalfJung changed the title Do function pointers have provenance? Do function pointers behave like data pointers (wrt provenance and other aspects)? Apr 27, 2023
@RalfJung
Copy link
Member Author

#309 got folded into this issue, so I generalized the title a bit to not just be a about provenance -- there are also questions around whether these types even have the same size etc

@Nemo157
Copy link
Member

Nemo157 commented Jan 6, 2025

Somewhat related to this is: what are even the semantics of function {item,pointer} to {pointer,address} cast. These are named in the reference but don't appear in the following semantics section. AFAIK the example in the transmute docs linked above is the only place that implies that transmute<*const (), fn()>(some_fn as _) is a round-trip.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 6, 2025

Function items have no data, so casting those to function pointers is a very special operation that synthesizes a suitable pointer "out of thin air". This operation is non-deterministic; executing it multiple times for the same function can produce different pointers.

Functions points are either like usize or like *const () (depending on the outcome of the discussion in this issue) so either way their casts to usize behave like the corresponding carrier type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) C-open-question Category: An open question that we should revisit
Projects
None yet
Development

No branches or pull requests

5 participants