Skip to content

Latest commit

 

History

History
413 lines (297 loc) · 15.7 KB

3607-enum-discriminant.md

File metadata and controls

413 lines (297 loc) · 15.7 KB

Summary

Enable using .enum#discriminant on values of enum type from safe code in the same module to get the numeric value of the variant's discriminant in the numeric type of its repr.

Motivation

Today in Rust you can use as casts on field-less enums to get their discriminants, but as soon as any variant has fields, that's no longer available.

Rust 1.66 stabilized custom discriminants on variants with fields, but as the release post said,

Rust provides no language-level way to access the raw discriminant of an enum with fields. Instead, currently unsafe code must be used to inspect the discriminant of an enum with fields.

As a result, the documentation for mem::Discriminant has a section about how to write that unsafe code, and a bunch of warnings about the different incorrect ways that must not be used.

It's technically possible to write a clever enough safe match that compiles down to a no-op in order to get at the discriminant, but doing so is annoying and fragile.

And accessing the discriminant is quite useful in various places, so it'd be nice for it to be easy.

For example, #[derive(PartialOrd)] on an enum today uses internal compiler magic to look at discriminants. It would be nice for other derives in the ecosystem -- there's a whole bunch of things on enums -- to be able to look at the discriminants directly too.

With this RFC, the built-in derives and third-party derives can both use the same stable feature to implement PartialOrd::parial_cmp for the cases where the arguments have different discriminants.

Guide-level explanation

Rust 1.66 stabilized custom discriminants on enum variants, but didn't give a nice way to actually read them.

In this release, you can use .enum#discriminant to read them.

For example, if you have the following enum,

#[repr(u8)]
enum Enum {
    Unit = 7,
    Tuple(bool) = 13,
    Struct { a: i8 } = 42,
}

Then the following examples pass:

let a = Enum::Unit;
assert_eq!(a.enum#discriminant, 7);
let b = Enum::Tuple(true);
assert_eq!(b.enum#discriminant, 13);
let c = Enum::Struct { a: 1 };
assert_eq!(c.enum#discriminant, 42);

That's entirely safe code, and the value comes out as the type from the repr, avoiding the change to accidentally use a mismatched type.

To avoid making implicit semver promises, this is only available for enums that are defined in the current module. If you want to expose it to others, feel free to define a method like

impl Enum {
	pub fn discriminant(&self) -> u8 {
		self.enum#discriminant
	}
}

for others to use, or use one of the many derive macros on crates.io to expose it through a trait implementation.

Reference-level explanation

Lexing

In edition 2021 and later, enum#discrimant becomes a legal token, using part of the syntax space previously reserved in RFC#3101.

This means that

macro_rules! single_tt {
	($x:tt) => {}
}
single_tt!(enum#discrimant);

now matches, instead of being a lexical error.

In editions 2015 and 2018, this feature is not available.

Parsing

A new form of expression is added,

DiscriminantExpression :

Expression . enum#discriminant

Like .await, this is not a place expression, and as such is invalid on the left-hand side of an assignment, giving an error like the following:

error[E0070]: invalid left-hand side of assignment
 --> src/lib.rs:5:29
  |
5 |         x.enum#discriminant = 4;
  |         ------------------- ^
  |         |
  |         cannot assign to this expression

Visibility

This acts as though it were a pub(in self) field on a type.

As such, it's an error to use .enum#discriminant on types from sub-modules or other crates.

mod inner {
	pub enum Foo { Bar }
}
inner::Foo::Bar.enum#discriminant // ERROR: enum discriminant is private

Type

The LHS is auto-deref'd until it finds something known to be an enum.

Note: this is different from mem::discriminant. For example,

#![allow(enum_intrinsics_non_enums)]
enum MyEnum { A, B }
let a = Box::new(MyEnum::A);
let b = Box::new(MyEnum::B);
assert_eq!(std::mem::discriminant(&a), std::mem::discriminant(&b));
assert_ne!(a.enum#discriminant, b.enum#discriminant);

For this, a generic parameter is never considered to be an enum, although a generic enum where some of the generic parameters to the enum constructor are not yet known is fine.

It's an error if, despite deref'ing, the LHS is still not an enum.

If the enum has repr(uN) or repr(iM), the .enum#discriminant expression returns a value of type uN or iM respectively.

If the enum does not specify an integer repr, then it returns isize.

Note: isize is rarely the desired type for discriminants, and indeed custom discriminants on types with fields are disallowed without explicit repr types. Returning isize is fine here, though, thanks to privacy because the code inside the module can be updated should it change to specify a specific type.

Semantics

When the LHS of a discriminant expression is a place, that place is read but not consumed.

Note: this can be thought of as if it read a field of Copy type from the LHS.

This lowers to Rvalue::Discriminant in MIR.

As this expression is an r-value, not a place, &foo.enum#discriminant returns a reference to a temporary, aka is the same as &{foo.enum#discriminant}. It does not return a reference to the memory in which the discriminant is stored -- not even for types that do store the discriminant directly.

This expression is allowed in const contexts, but is not promotable.

Note: the behaviour of this expression is independent of whether the type gets layout-optimized. For example, the following holds even if x is 2_i8 in memory.

enum MyOption<T> { MyNone, MySome(T) }
let x = MyOption::<std::cmp::Ordering>::MyNone;
assert_eq!(x.enum#discriminant, 0_isize);

Drawbacks

This isn't strictly necessary, we could continue to get along just fine without it.

  • For the FFI cases the layout guarantees mean it's already possible to write a sound and reliable function that reads the discriminant.
  • For cases without repr(int), custom discriminants aren't even allowed, so those discriminants much not be all that important.
  • It's always possible to write a match in safe code that optimizes away and produces exactly the same thing that this new expression would.
  • A pseudo-field with # in the name looks kinda weird.
  • There might be a nicer way to do this in the future.

Rationale and alternatives

Why have a # in the name?

By not being an identifier, .enum#discriminant can't conflict with anything.

While today there are no fields directly accessible from values of enum type, there are lots of plausible-enough proposals that would allow some.

For example, enum variant types have come up repeatedly, which would represent a single variant and thus would allow accessing the fields on that type, but plausibly would still offer access to the discriminant. Similarly, a pattern type that restricts the enum to a single variant would plausibly allow access to its fields. And one of those fields might be named discriminant.

Other requests have come in too, like allowing field access if every variant has a field with the same name & type or allowing field access if there's only a single inhabited variant.

By being clearly different it means it can't conflict with any field or method. That also helps resolve any concerns about it looking like field access -- as existed for .await -- since it's visibly lexically different.

And the lexical space is already reserved,

Why have enum in the name?

Well, it seemed short and evocative enough to be fine. Doing something like e# isn't shorter enough to matter, and I'd rather save very-short prefixes for higher-prevalence things.

And since it's a pre-existing keyword, it means that

let d = foo().bar.enum#discriminant;

already gets highlighting on the enum in my editor without needing any updates.

Isn't this kinda long?

Not really, compared to the existing possibilities.

For example, in a macro expansion even the internal magic today ends up being

let __self_tag = ::core::intrinsics::discriminant_value(self);
let __arg1_tag = ::core::intrinsics::discriminant_value(other);
::core::cmp::PartialOrd::partial_cmp(&__self_tag, &__arg1_tag)

to avoid any accidental shadowing.

In comparison,

let __self_tag = self.enum#discriminant;
let __arg1_tag = other.enum#discriminant;
::core::cmp::PartialOrd::partial_cmp(&__self_tag, &__arg1_tag)

is much easier.

Outside of macros, something like

discriminant(&foo)

(which requires a use std::mem::discriminant;) isn't that different from

foo.enum#discriminant

And of course you can always make a function to give it a shorter name -- or write a proc macro to generate that function -- if you so wish.

Why just pub(in self)?

The primary use case that led to this RFC is using it in derive macros, where pub(in self) is entirely sufficient.

And by being only private, it avoids forcing any semver promises on library authors.

Today, as a library author, you can reorder the variants in an enum should you so wish, or in a #[non_exhausive] enum add new ones in the middle. There's no way for the users of your library to care about the order in which you defined the variants (unless you make other documented promises) -- especially if you never derive(PartialOrd).

Any library author who wishes to provide discriminant stability can always write a function to expose those discriminants, trivially implemented using this feature.

Why expose it via .?

I like it behaving kinda like a field. For example, having auto-deref like a field means you don't need to worry about whether you actually have a &&Enum in a filter or you actually have a Box<Enum> or whatever.

Of course, if the enum is repr(C), then the discriminant is a field in the guaranteed FFI layout, so thinking of it kinda like a field isn't too weird.

There has also been talk of compressed or move-only fields where getting the address is disallowed so that Rust can run arbitrary logic whenever they're accessed and thus have the freedom to do more layout optimizations than are otherwise possible. Should we have something like that, then it's again not unreasonable to think of it as a field that sometimes has particularly fancy layout optimization.

What about if it was a magic method instead?

It could be. But it would still need to be something that doesn't cause name resolution failures for other methods that people might already have written.

So I don't think that the extra () on it would really improve things.

Why not allow writing to the discriminant?

The semantics for that get really complicated, especially for enums in repr(Rust) that don't have a guaranteed layout, and even more so those that get layout-optimized.

Maybe one day it could be allowed, but for now this RFC sticks only things that can be allowed in safe code without worries.

Couldn't this be a magic macro?

Sure, it could, like offset_of!.

I don't think enum_discriminant!(foo) is really better than foo.enum#discriminant, though.

It doesn't deal in tokens, and there's no special logic to apply to the scope in which the argument is computed.

It works on a value or place, not on anything dealing tokens, nor does it affect a scope.

Why not do <more complex feature>?

Privacy is the problem.

If we wanted to just expose everything's discriminant to everyone, it'd be easy to have a trait in core that's auto-implemented for every enum.

But to do things in a way that doesn't add a new category of major breaking change, that gets harder.

It'd be great if we had scoped trait impls, for example, so we could do that in a way where it's up to the trait author how visible things get. But that's a massive feature, so it would be nice not to block on it.

Or libs-api could create a new trait and a new derive that's implemented using the same magic that today's derive(PartialOrd) uses. But that's another big bikeshed, and doesn't even work very well for the "I'm writing my own customized derive" cases that just want to use the discriminant internally.

The goal here is to do something easy using syntactic space that's not particularly valuable anyway -- if people end up almost never using this directly because there's a popular community derive, that's great.

What about as?

While as works on field-less enums, it's not that great there either.

It has the fundamental problem that you have to write out the target type that you want, and the wrong one will silently truncate. This hits the same general "as is error-prone" theme that is pushing people away from using as to using more-specific things instead that are either lossless or clearer, to help avoid mistakes.

If this exists, I wouldn't be surprised to see people using foo.enum#discriminant even in places where foo as u8 works and is shorter since you don't have to think "what was the repr of this, again?" and you just get the right thing.

Should the enum's declared repr not be the type you actually want, you can always use .enum#discriminant and then as cast it -- or hopefully .into() or something else with clearer intent -- into the type you need.

Prior art

C++'s std::variant has an index method, which always returns std::size_t since there's no custom discriminants. (It's more like what rustc calls a variant index internally.)

Unresolved questions

  • Is auto-deref worth it? I would propose leaving it in the RFC for merging, as wanting to use this on &Enum will be common, but if in the course of implementing it's particularly annoying then stabilizing without it would be tolerable, since error messages could suggest the correct thing.

Future possibilities

If this turns out to work well, there's a variety of related properties of things which could be added in ways similar to this.

For example, you could imagine MyEnum::enum#VARIANT_COUNT saying how many variants are declared, MyEnum::enum#ReprType to get the type of the discriminant, or my_enum.enum#variant_index to get the declaration-order index of the variant (as opposed to its discriminant value).

Those are much easier to generate with a proc macro, however, so are not included in this RFC. They would need separate motivation from what's done here.